NOTES ON 
| THE TEACHING OF 
STATISTICS IN SCHOOLS 


By, 
B. C. Brookes, M.A. 


WITH A FOREWORD BY 


Professor E, S. Pearson 
OBES M.A; D.Sc. 


aA ILLUST RATED 
i iy Ge 


~~ EINEM ANN 


BY THE SAME AUTHOR 


An Introduction to Statistical Method 
(HEINEMANN) 


See back cover for description 


NOTES ON 
THE TEACHING OF 
STATISTICS IN SCHOOLS 


by 
B. C. BROOKES, M.A. 


Lately Senior Mathematics Master, Bedford Modern School 
Member of the Teaching Committee of the 
Royal Statistical Society 


With a Foreword by 
Proressor E. S. Pearson, C.B.E., M.A., D.Sc. 


Department of Statistics, University College, London 


CG Library 4N 
(2) a 
x es 
SERA 
Ze, Calcutta te 
a 
Oo eo 
AN a 
a 
A 


Ai f WILLIAM HEINEMANN LTD 
of MELBOURNE : LONDON : TORONTO 


S.C.E ` Wast Bengar 


Date 
Acc. No... eia 


Bureau Ednl. Psy. Research 
J NOH "ING COLLEGE 
t Deted 

ox No 


Bureau Ednl. Psy. Research 
DAVID-HA E TRAINING COLLEGE 


Dated . sf Abie M. ra 
Aecs, No. » 214A... 


PUBLISHED BY 
WILLIAM HEINEMANN LTD 
99 GREAT RUSSELL STREET, LONDON, W.C.I 
PRINTED IN GREAT BRITAIN BY BUTLER AND TANNER LBD. 
FROME AND LONDON 


FOREWORD 


FIGURES are nowadays being used continually to prove this 
thing or that, yet faulty reasoning is all too common, even 
when the reasoner is trying to be honest. There can be no 
better way of encouraging a balanced adult view of figures than 
to make the boy or girl familiar at school with what has been 
termed ‘the statistical approach. Developed by easy steps in 
which the children produce their own figures by observation, 
subject them to simple manipulation and speculate on their 
meaning, statistics can not only be made fascinating but also 
provide one of the best forms of elementary training in clear 
and critical thinking. 

There are many teachers who are convinced that, given the 
opportunity, they could introduce in this way the simpler 
statistical ideas and problems to children of 14 or 15 as part 
of their general education and in so doing affect their intellec- 
tual approach to other subjects. Already, at a later stage, the 
occasional teaching of statistics as a more specialised Sixth 
Form subject, allied with mathematics, has been recognised by 
the inclusion of optional questions or papers in the General 
Certificate of Education syllabuses of several examining 
authorities. But as yet little help has been given to the pioneer 
teacher in the shape of a text-book or even an elementary set 
of Tables. 

In their recent Report, the Royal Statistical Society’s Com- 
mittee on the Teaching of Statistics in Schools laid emphasis on 
_ these two aspects of the problem, the general and the special- 
ised, and it was indeed in connection with discussions in this 
committee that Mr. Brookes first prepared the two sections of 
these ‘Notes’. If Section II is a series of connected notes which 
should prove helpful in teaching for the G.C.E. syllabuses, 
Section Tis really more than this. It seems to me to provide an 
admirable account of how one teacher with imagination would 

iii 


iv FOREWORD 


approach those basic statistical concepts: variation, correla- 
tion, probability, sampling; how he would make his pupils 
collect the data by which to clothe these concepts with reality; 
and how he would use the results to encourage a critical and 
inquiring attitude of mind. 

When the method of introducing a subject is still in the ex- 
perimental stage, each teacher must to a great extent find the 
route best suited to the age and ability of his pupils, but there 
will be few who cannot make some use of the many and varied 
suggestions that Mr. Brookes has provided. Many of the ideas 
which he illustrates are not peculiar to statistics and he has 
realised the importance of taking every opportunity of linking 
up what is new with familiar ideas and methods already taught. 
Indeed, much will be gained if the statistical approach is seen 
to be largely a combination of common sense and scientific 
method under another name! 

E. S. PEARSON, 
Department of Statistics, 
University College, London. 
June 1952. 


EE eee 


PREFACE 


Reasons for teaching Statistics'in schools have been given in 
a report prepared by the Teaching Committee of the Royal 
Statistical Society, which is obtainable from the Assistant 
Secretary of the Society, 4 Portugal Street, London, W.C.2. 
(Price, 1s.) 

In these notes an attempt has been made to give practical 
detailed guidance to teachers in secondary schools on thé 
teaching of Statistics at all levels. Section I describes work 
that might be included in the general curriculum: Section II 
is a teaching commentary on the published Statistics syllabuses 
of the G.C.E, examining authorities. 

The notes are of course intended only for those who have had 
no academic training in, or practical experience of, the subject, 


, but who nevertheless appreciate that Statistics is of both prac- 


tical and educational value. The notes, moreover, are only 
suggestions: it is not pretended that they outline a complete 
or systematic course. But if by provoking constructive criti- 
cism they help to make Statistics, more generally taught in 
schools, they will have served a useful purpose. 

Professor E. S. Pearson kindly read Section I in manuscript 
and made helpful criticisms for which I am most grateful, but 
the responsibility for any deficiencies that remain is of course 
mine alone. I must also thank Mr. Cyril Bibby for his help and 
Suggestions about the application of statistical methods to 
biology. 


B. C. B. 


CONTENTS 


PAGE 
FOREWORD iii 
PREFACE v 
SECTION I 
The Teaching of Statistics in the General Curriculum 
1. Introduction 1 
2. Diagrams and Charts 2 


(a) Representation of Class-Data 
(b) Representation of Time-Data 
_ (c) Trends, Seasonal Movements and Random 
Fluctuations in Time-Charts 
(d) Further Points about Charts 


(e) Maps 
3. Percentages, Rates and Ratios 12 
‘4, Compound Units ae 
| 5. Averages 15 
| 6. Index Numbers 22 
| 7. A Measure of Dispersion 23 
8. Frequency Distributions 24 
9. Correlation ‘ 30 
10. Probability and Simple Probability Calculations 36 
11. Sampling 42 
| 12. Conclusion 49 


viii CONTENTS 


Section IT 
Teaching Notes on the G.C.E. Syllabuses in Statistics 
1. Introduction 
2. Descriptive Statistics 
(a) Tabulation and Graphical Representation 
(b) Frequency Distributions and Histograms 
(c) Averages; Mode, Median, Mean 
(d) Moving Averages 
(e) Weighted Means 
(f) Index Numbers 
(g) Vital Statistics 
(h) Measures of Dispersion 
(j) Standard Measure . 
(k) Variance ` 
(J) Continuous Distributions 
3. Probability and Sampling 
(a) The Laws of Probability 
(b) Some Approximations 
(c) The Binomial Distribution 
(d) The Poisson Distribution 
(e) Simple Sampling of Attributes 
(f) The Normal Distribution 
(g) The Rectangular Distribution 
(h) Sampling of Variables—Large Samples 
(j) Population Parameters and Sample Statistics 
(k) Levels of Significance and Confidence Limits 
(l) Sampling of Variables—Small Samples 
4, Bivariate Distributions 
(a) Scatter Diagrams 
(b) Linear Regression 
(c) The Product-Moment Correlation Coefficient 
(d) The Rank Correlation Coefficient 
5. Simple Contingency Tables 


INDEX 


50 
51 


60 


74 


76 
79 


SECTION I 


THE TEACHING oF STATISTICS IN THE GENERAL 
CURRICULUM 


1. INTRODUCTION 

The purpose of this Section is to give some teaching notes on 
the topics listed in Para. 22 of the Royal Statistical Society’s 
Memorandum on the Teaching of Statistics in Schools. It is 
emphasised that these notes do not pretend to outline a syste- 
matic course of Statistics—for the greater part they merely sug- 
gest extensions of some of the topics that are already taught in 
the arithmetic course usually given to children of the age range 
12 to 14 years in grammar, modern and technical schools. All 
that is proposed is that these statistical extensions can be dis- 
cussed in class as suitable opportunities arise, as they frequently 
do, although an occasional review of a whole section, e.g. dia- 
grams, is recommended as a means of consolidating previous 
work. It is also hoped that the notes will show convincingly 
that the suggested subject matter is not inherently difficult for 
schoolchildren and that it is valuable in relating the work of 
the mathematics classroom to other work and interests of 
children. In schools where the teacher of arithmetic is also 
responsible for the teaching of some science or geography, it 
will be easy for him to select examples for study that combine 
statistical interest with educational value. Elsewhere the 
mathematics teacher should try to enlist the co-operation of 
his scientific colleagues by persuading them to provide him 
with the results of class experiments; the numerical analysis 
can be carried out in the mathematics class and the discussion’ 
of the class results can follow in the science class—with benefit 
to both. The examples used in these notes are intended only 
to suggest to the teacher what can be done; it is important that 
he should use as far as possible data that directly concern the 

' 1 


2 TEACHING OF STATISTICS IN SCHOOLS 


members of his classes. Only if the class knows and under- 


stands the whole background from which the data have been 
collected can a discussion of the statistical implications be 


rofitable. The emphasis of teaching Statistics in the general 
curriculum must be on interpretation rather than on technique. 


2. DIAGRAMS AND CHARTS 
The discussion of diagrams and charts could well begin by 
inviting a class to bring to school any interesting diagrams or 


charts that they notice in their casual reading of newspapers . 


and popular periodicals. The material brought will almost cer- 
tainly include a sufficient number of good and bad charts to 
provide the teacher with illustrative examples and to provoke 
the children to further search. The notes that follow may 
suggest to the teacher lines on which the subject may be 
discussed with 13- or 14-year-old children. 

The purpose of a statistical diagram is to represent numeri- 
cal data so that their salient features are more quickly and fully 
comprehendéd than by a study of the original numbers, If a 
diagram is to be helpful it must satisfy two conditions. First 
it must be simpler prehend than the numbers it repre- 
sents, and secondly its picture of the numbers and their inter- 

4 relations must be a.true one. Published diagrams sometimes 
fail by being over-elaborate, e.g. too complex or too highly 
coloured, or by being misleading, e.g. essentially incomplete 
or wro dimensi 

Two main types of simple statistical tables will be considered: 
(a) ‘Those showing a classification, e.g. the way in which the 
pupils of a school are divided into ‘houses’, (b) those giving a 
time-sequence of numbers, e.g. the daily attendanceat a school. 
‘These we will call class-data and time-data respectively. 
ee p ee 


2(a) Representation of Class-Data 

Class-data are commonly represented by one of three types 
of diagram Known as the ‘bar-chart’, ‘pie-chart’ and ‘symbol- 
chart’. The choice of chart depends partly on the aspect of the 


“data to which attention is to be dr: The three‘types will be 


DIAGRAMS AND CHARTS 3 


exemplified by means of the data of Table 1. The units selected 
for the bar-chart depend on the space available for the dia- 
gram. In Fig. 1 a centimetre represents the number 20. The 
figures for the pie-chart are obtained by multiplying each figure 


No. of Boys 


Bar Chort 
House 
; ETITI 
„85 e ġtġttitti 
ES metiiit) 
“Ga o ġġġtý 


ý ae tttitt 
mitiiti it) 


i i H S 
Relative sizes of House: | Sign for 10 boys 


BietChate Symbol Chart 


Fia. 1. Charts of class-data. 


of the second column by 360/438, rounding off the results to 
two figutes. As a check the column should be summed to give 
the total 360. For the symbol-chart a unit of 10 pupils has 
been chosen; fractions less than 4 have been ignored, fractions 
greater than $ have been counted as units. A 
The bar-chart clearly shows the comparative sizes of the six 


4 TEACHING OF STATISTICS IN SCHOOLS 


TABLE 1 
Class-data 
DISTRIBUTION OF PUPILS AMONG THE SIX HOUSES OF A SCHOOL 
No. of Tar obori Pig chert Symbol obact 
F ` pupils Eae ea} ee PEN 
A 79 395 65° 8 
B 85 425 70° 8} 
Cc 63 315 52° 6 
D 48 240 39° 5 
E 72 360 ' 59° KÀ 
F 91 455 75° ji 9 
Total 438 360° 


houses; the human eye is trained at an early age to discriminate 
between different lengths suitably placed. The sectors of the 
pie-chart are better able to show the relationship of the parts 
to the whole, i.e. of the size of any one house relative to the 
total school population; schoolchildren of 13-14 years have an 
unfailing judgment of the relative sizes of pie or cake. The 
‘Isotype’ symbol-chart is, for unsophisticated minds, more 
attractive than the formal bar-chart though it is based on 
a similar principle. It is however difficult for a child to make 
an attractive symbol-chart because it is not easy to draw 
repeated symbols which are exactly similar. For display work 
a symbol at least 1 inch in length, cut as a stencil from card- 
board, is helpful. 

There still occurs a type of symbol-chart that is misleading 
or ambiguous. When the different classes are represented by 
the same symbol in different sizes it is not usually possible to 
discover easily which dimension has been used as the basis of 
comparison. Areas are much more difficult than lengths to 
compare visually, and if the symbol is drawn to suggest some- 
thing solid, e.g. a ship or a sack, there is even more difficulty 


DIAGRAMS AND CHARTS 5 


in making true comparisons. The checking of published ex- 
amples of such symbol-charts provides useful exercises in 
ratios, square roots and cube roots for children who can use 
logarithms. Such exercises perform an educational function, 


(b) Areas and suggested volumes 1:2:3 


Fic. 2. Diagrammatic representations of the ratios 1 : 2: 3. 


if only they encourage children not to accept uncritically 
everything they see in print. Classroom models illustrating 
the ideas of Fig. 2 are essential to ensure that children appre- 
ciate the ambiguities of phrases like ‘twice as big as’ and ‘three 
times the size of’ applied to areas and volumes. 


6 TEACHING OF STATISTICS IN SCHOOLS 


2(b) Representation of Time-Data 

Time-data are represented by charts on which it has become 
conventional to mark ‘time’ along the base or z-axis and the 
other quantity along the y-axis. Examples of time-charts are 
commonly used to introduce the idea of co-ordinates before 
work is begun on graphs. (The word ‘graph’ is best applied 
to the diagram of a mathematical function.): It is therefore 
necessary only to mention here some important guetions which 
are frequently overlooked. 

First, should the points marked on the chart be joined by 
straight or smooth lines—or even joined at all? The answer 
depends partly on the nature of the quantity measured. If it 
is continuous, e.g. the temperature of a test-tube of melted 
wax that is being allowed to cool, then it is reasonable to draw 
a smooth curve through the plotted points. The fact that the 
curve is drawn implies that, at any instant during the period 
of the measurements, the wax has a temperature of which the 
ordinate corresponding to the instant is the best estimate. If 
the quantity measured is discrete, e.g. the daily attendance 
at the school, then the joining of successive plotted points 

“meaningless—though it is often done. If the quantity 
measured is continuous but erratic and is measured at inter- 
vals, e.g. the barometric height or the depth of a reservoir 
measured, daily, then it is reasonable to join successive points 
by straight lines. The straight lines imply continuity of the 
variable but, on the part of the observer, ignorance of the 
intervening fluctuations. 

Secondly, the points are sometimes carelessly plotted so that 
though the general picture they give is correct, there is doubt 
about the detailed information that the chart provides. The 
difficulties are best made clear by asking the class to draw 
charts illustrating data of the following kind: 

(a) Monthly totals. d 

(b) Stocks at end of month. A 

(c) Weekly averages per month. 
Suitable data for exercises from which items of topical interest 
can be chosen can be found in any issue of the Monthly Digest 


DIAGRAMS AND CHARTS 7 


of Statistics (H.M.S.O. 2s. 6d.). When the diagrams are drawn 
it is an instructive exercise to redistribute them among the 
pupils and ask them to reconstruct the table of data on which 
the diagrams are based, or alternatively to read from the 
diagrams a few selected values. 


2(c) Trends, Seasonal Movements and Random Fluctuations in 
Time-Charts 

As time-charts are practically so important it may be worth 
spending two or three lessons on their analysis, particularly for 
pupils who are unlikely to continue their academic careers 
beyond the school-leaving age. The main ideas of the simple 
treatment of time-series analysis described below can be appre- 
ciated by children who have drawn only four or five charts, if 
the data are well chosen. 

The feature of a statistical time-chart that is most obvious 
to a schoolboy is its irregularity. It may suggest broadly a 
simple pattern of recurrent peaks or a rather uncertain wander- 
ing up and down from one side of the chart to the other, but 
superimposed on the main trend there is usually a rather 
dazzling succession of random fluctuations. We need some 
simple method of separating the wood from the trees, i.e. of 
‘smoothing’ the random irregularities so that the main features 
of the time-chart can be more readily seen and described. 

The simplest way of showing the trend of an irregular chart 
is to draw a smooth curve ‘through’ the points, ignoring the 
minor fluctuations. This can be done visually but it takes a 
practised eye to do it well. The more sophisticated way of 
calculating a series of ‘moving averages’ is not difficult and 
is much more interesting. The averages of successive sets of 
readings (say 10) are calculated and are plotted on the graph 
so that each corresponds with the centre of the period spanned 
by the set. If the average of the first 10 readings y,, Ya . . - 
Yio, is m,, then the average m, of the next 10 readings, Yz 
Ys, - - - Yr, is given by 


My = m, + (Yu — Yy)/10 
and so on. 


updsyquow 9- «u ---- 
uods uow 7j- 3^In3 abosano Buro =.= mm 
AIND payojd 2is0g ———— 


( 4.) ainjosadway 


verages of 
sea-level; 


cyclic data (mean daily temperature at 


; England and Wales). 


Fic, 3. Moving @ 


S€61-906| Ss02A soj uoaw Ájyquowy —-—-——— 
ansn> pəyojd s1sog 
Aay 
: [i 
anosvtewwwaranosy te thé 


uywaronosyf wywar 
O£ 


Or 


. 


(4.) 2unqosadway 


Fig. 4. Random fluctuations from the mean seasonal change. 


B 


9 


10 TEACHING OF STATISTICS IN SCHOOLS 


If the time-chart shows a regular periodicity then some care 
in the choice of the span of the moving average is needed. If 
the periodicity recurs after 12 readings for example, as it does 
with any quantity that varies seasonally and of which monthly 
readings are taken, then sets of 12 (or some multiple of 12) 
readings must form the span of the moving average in order 
to smooth out the periodicity. If we wish to examine the 
periodic fluctuation itself then the averages of readings cor- 
responding to successive phases of the period are calculated 
and plotted to show the average cycle. 

All these points are illustrated in Figs. 3 and 4. Super- 
imposed on the original chart are the charts of moving averages 
with 6-month and 12-month spans (Fig. 3). As we should 
expect, the 12-month span smooths out the seasonal fluctua- 
tions and shows that there is no appreciable change in mean 
temperature from year to year during the years considered. 
The curve based on the means for January, for February, etc., 
is also drawn (Fig. 4), and repeated from year to year; it shows 
the random fluctuations from the mean seasonal change, and 
enables us to say for any particular month whether or not it 
was ‘hot’ or ‘cold’ for ‘the time of the year’. 

Data suitable for illustrating these procedures with time- 
charts, if not readily available in the school, can be found in 
any issue of the Annual Abstract of Statistics (H.M.S.O. 10s.), 
but data of local or topical interest should be used if possible. 
Some suitable exercises are: 1. Examination of the daily 
absentee figures to see if there is any weekly periodicity (e.g. 
in schools at which attendance on Saturdays is required the 
figures may show a marked increase in the number of absentees 
on that day). 2. Day-to-day recording of the local meteoro- 
logical data. These are of more interest if they are compared 
with the corresponding data from another locality, e.g. those 
published daily in The Times for London, or data provided by 
another school in a different part of the country. They should 
at least be compared with the known means for the locality 
with or without appropriate moving averages. 


| 
| 


DIAGRAMS AND CHARTS 11 


2(d) Further Points about Charts 

Sometimes we may need to compare large numbers that are 
almost equal, where the important characteristic is not the 
approximate equality of the numbers but their differences. 
The bar-chart, as already described, may then fail to illustrate 
the characteristic in which we are interested. Is it reasonable 
to enlarge the scale of the bar-chart and chop off some of the 
inconvenient length? Yes, as long as we make clear what has 
been done. An alternative method is sometimes possible. If 
the large numbers were, for example, the number of pupils 
attending school, it might be more instructive to consider the 
number of absentees instead of the number of those present. 

Finally, any chart should be as far as possible complete and 
self-explanatory. The title and the legends should be brief but 
adequate; any scale of numbers should be clearly marked. 


2(e) Maps 

The regional distributions of particular economic activities 
are very conveniently summarised by using maps of the region 
marked in a suitable way. Children are more likely to be 
interested in such maps if they have had opportunities of 
making them. Many good examples will be found in modern 
books on economic geography (e.g. English Cownty—A Plan- 
ning Survey of Herefordshire. Faber). With mapsas with charts 
the information must be simply and fairly conveyed; false im- 
pressions are given, for example, by maps showing the results 
of parliamentary or local government elections by shading 
areas. A suitable class exercise is to prepare a map of the 
school’s locality to illustrate “The Journey to School’. The 
pupils’ homes could be marked on the appropriate places by 
dots (say 1 dot for 5 pupils); their various methods of travel 
(bus, train, bicycle, etc.) could be indicated by streams of 
different colours; their routes by streams of width proportional 
to their traffic density. The collection and summarising of the 
necessary data is in itself a useful exercise; inevitably there 
will be cases which defy classification into any simple scheme 


12 TEACHING OF STATISTICS IN SCHOOLS 


and which will demand more careful definition of the classes. 
The data can be used in later exercises on sampling. 


To maintain the children’s interest in diagrammatic repre- 
sentation some space on the classroom wall should be given to 
well-drawn charts illustrating subjects of topical interest, e.g. 
the local barometric data, sports results, house competitions, 
etc., and to a ‘rogue’s gallery’ of bad and misleading charts 
brought by pupils. 


3. PERCENTAGES, RATES AND Ratios 


A common defect in the teaching of percentages, rates and 
ratios, topics to which. much time is usually given ‘in school 
mathematical courses, is that too much emphasis is placed on 
the techniques of calculating them and too little on the reasons 
for their use, on their interpretation, and on the selection of 
the standard or base of the comparison. The techniques are 
comparatively simple but it seems to be too readily assumed 
that if children learn to know ‘how’ they will by some inner 
light learn also to know ‘why’. That many intelligent children 
do not must be obvious if the frequent elementary mistakes 
and confusions to be met in everyday life are noted. Thus, to 
give one example, the B.B.C. recently announced that London 
taxi-drivers were on strike because their claim for an increase 
of 7% in their share of the metered fares they take had been 
refused by the taxi owners. What the drivers claimed in fact 
was an increase in their share from 334% to 40%, ie. an 
increase of 20%—which perhaps more readily explains why 
a strike occurred. Here it is evident that confusion exists about 
the basis of comparison, though sometimes similar difficulties 
arise because the speaker or writer cannot express himself 
exactly even if he knows that precise expression is necessary. 

Though the early teaching may be sound, many text-books 
of arithmetic contain a section of ingenious ‘harder problems’ 
which is well contrived to shatter for ever the confidence of 
most children in rates, ratios and percentages. It is suggested 
that the time spent on these trickier problems could be better 


PERCENTAGES, RATES AND RATIOS 13 


spent in discovering how to use rates, ratios and percentages 
in practical or social problems. A typical ‘harder problem’ 
reads: ‘A man would gain 20% by selling an article for 9s. 6d. 
and 15% by selling a second article for 7s. 2d.; for what does 
he sell the second article if there is no loss or gain on the two 
sales? ` For many children who have reached the stage (at 15 
years) at which they are expected to solve this kind of puzzle 
a discussion of some measures of social importance, e.g. the 
cost-of-living index, would be of greater educational value. 
But even at an earlier stage an important point is commonly 
missed. Consider the simpler problem: ‘On farm A 135 tons of 
wheat were harvested from 180 acres; on farm B 150 tons were 
harvested from 200 acres. Which farm had the greater yield 
per acre?’ This problem gives little difficulty to many children 
who would be unable to answer in precise terms the question 
in this form: ‘How could you find out which of two farms gave 
the greater yield of wheat per acre? What information would 
you need to know?’ By putting the question in the numerical 
form it becomes a matter only of technique; the second form 
of the question requires a deeper understanding of the problem 
which children are usually expected to attain simply by 
repeated working of numerical questions. 

Some problems needing judgment in the selection of appro- 
priate measures of comparison are required in place of ‘harder 
problems’ or of some of the many problems which are merely 
straightforward applications of technique. Text-book and 
examination questions almost always specify the form in which 
the answer must be given. What percentage . . .? What is 
the proportion . . .? The advantage to the teacher and the 
examiner is that such questions have only one answer that is 
correct, so that marking is simplified, though much of the 
value of the problem to the pupil is lost. More examples of the 
following kind would be helpful: 

. 1, Two football clubs A and B played a match which A won 
by 2 goals to 1. Would you approve or disapprove of the 
following alternative descriptions of the result: 

(a) A won by 100%, 


14 TEACHING OF STATISTICS IN SCHOOLS 


(b) A won by 50%, 

(c) B lost by one goal, 

(d) A was twice as good as B? 
Give another way of stating the result which you think is 
suitable. h 

2. An article which cost 2d. in 1940 cost 6d. in 1950. State 
the increase in cost in three ways. Which is the most suitable 
way of describing this increase? 

3. There are 20 pupils absent from school A and 15 pupils 
absent from school B. What further information would you 
need in order to make a fair comparison of absence from the 
two schools? Why? 


4. Comrounp UNITS 

Unless mechanics is taught in the school children rarely meet 
any of the compound units that are widely used in practical 
affairs. In fact some care to avoid them seems to be taken. 
A question from a modern text-book reads: ‘A can do a piece 
of work in 30 days and B can do it in 6 days. How long will 
A and B take working together?’ Intelligent children find such 
questions unrealistic. Though the teacher may see the question 
merely as a variant of the bath-tap problem and solve it auto- 
matically as such, many children would be conscious of the fact 
that two workers with such dissimilar capacities for work are 
unlikely to combine as harmoniously as the teacher assumes. 
Why are the practical units such as man-hours, ton-miles, 
kilowatt-hours, passenger-miles, acre-feet, etc., ignored? They 
are simple in principle, easy to manipulate, and their intro- 
duction to school courses would help to link the classroom with 
everyday affairs. Two suggested easy questions are as follows: 

1. The re-surfacing of a road is estimated to take 4800 man- 
hours. The men work 8 hours per day; 30 men are available 
for 10 days and for the remainder of the time only 12 men are 
available. How many working days should the whole work 
take? 

2. A wireless set consumes 80 watts when it is switched on. 
Estimate its cost per month of 30 days if it is used for an 


——————— 


AVERAGES 15 


average of 24 hours per day and if the supply charge is 2d. per 
unit (kilowatt-hour). 


5. AVERAGES 

The average or arithmetic mean is the statistical measure 
most commonly used for making numerical comparisons. The 
principle of its computation is well understood but its limita- 
tions are rarely mentioned in school. Children see the average 
widely and indiscriminately used: they often see their marks 
in History, Art, Latin, Mathematics, Religious Knowledge, 
Science and Woodwork solemnly summed and averaged; 
minute differences in the result may serve to quell all possible 
disputes about the most worthy recipient of the form prize. 
Schools have long accepted the average as an infallible device 
for picking the winner. Out of school, the child sees in his 
newspaper the cricket averages calculated to two places of 
decimals and in ordered sequence; he reads that a difference 
of 0-001 in goal average may decide important questions of 
promotion and relegation in the football leagues. The average 
is only too evidently the final arbiter of all statistical problems, 
though unexpressed doubts may arise in boys’ minds when 
they notice that Test Match selectors do not automatically 
choose the batsmen and bowlers with the best averages. 

Four modest extensions of the usual limited teaching of the 
average are suggested for the general curriculum: 

(a) That some consideration be given to the dispersion or 
‘spread’ of numbers averaged. 

(b) That it should be shown that the average is, for suitable 
data, merely a convenient shorthand description, useful for 
purposes of comparison, but that it is not an end in itself. 

(c) That the computation of the average be taught more 
systematically. 

(d) That commonly used applications of the average be more 
fully discussed. 

Even without a quantitative measure of dispersion children 
can be shown, if their teacher takes his opportunities, that the 
average by itself is often inadequate as a description. Thus two 


16 TEACHING OF STATISTICS IN SCHOOLS 


regions with approximately equal annual rainfall differ con- 
siderably in climate, scenery and economy if in one the rain 
falls only in the winter and in the other throughout the year; 
rivers that are either in flood or are dry are of much less use to 
man than a river which flows steadily throughout the year; 
manufacturers aim at uniformity in the quality of their pro- 
ducts so that their customers can rely on getting reasonable 
value for their money; and so on. Sometimes the minimum or 
the maximum value is more important than the mean; the 
strength of a chain is determined by its weakest link; a bridge 
has to be designed so that it will safely withstand the estimated 
maximum stresses. Much can be done by comment of this kind 
whenever suitable numerical data are being used or discussed. 

When an average is computed it is implicit in the operation, 
firstly, that the data used are homogeneous, and secondly, that 
the result is going to provide additional information that is 
useful. If, for example, a boy gets 5% for French and 95% 
for Mathematics, is it reasonable to add the marks together 
and is it helpful to know that his ‘average’ is 50%? It may 
of course be helpful to know that of two forms taking the same 
examination paper one form has an average of 60% and the 
other of 40%, but the calculation of the average as a matter 
of course or the averaging of data that are not homogeneous 
has nothing to recommend it. Discussion of the results of class 
experiments in the science laboratories can be used to illustrate 
some of the properties of the average and incidentally to help 
in the inculcating of the scientific attitude. Discrepancies 
between individual results, between the average result and 
the correct result (if known), can often throw light not only 
on individual errors but on deficiencies of technique. Small 
discrepancies, unaccountable at first, have often produced 
scientific results of great interest and importance (Lord Ray- 
leigh’s discovery of argon is a classic example) and children 
should be encouraged not to ignore them. Even the simple 
experiment of separating salt and sand by washing out the salt 
from a weighed sample followed by the drying and weighing 
of the sand residue produces interesting results. What are the 


me 


AVERAGES 17 


basic assumptions? That the sand is completely insoluble, that 
the salt is completely soluble in water, and that the washing 
water contains no suspended matter. Accepting these, what 
are the likeliest sources of experimental error? Insufficient 
washing out of the salt (How can we test the washings?), in- 
complete drying of the insoluble residue (How can we ensure 
that the residue is completely dried?). What effects would such 
errors have on the result of the experiment? What is the best 
way of stating the proportion of salt in the sample? Why are 
the results not all equal (assuming arithmetical errors to have 
been dealt with)? Was the original mixture homogeneous? 
Would it be sensible to average the result? What is the average 
result? Why is it different from the correct result (known from 
the composition of the mixture)? What is the range of the 
results? If we repeated the experiment again, avoiding all our 
known mistakes, what should happen’ to the range and the 
average of the results? Let us see if it does. i 

A more advanced experiment is typified by the determina- 
tion of the internal resistance of a Daniell cell. The experiment 
is carried out in the usual way with no special care. What is 
the range of the results? Were repeated results equal? Why 
not? Does the internal resistance of a cell vary? Are the 
differences ascribable entirely to unknown experimental errors? 
Can we test this? Consider the cell; what would affect its 
internal resistance? Dirty terminals? Poor contacts? Dirty 
pots? Strength of copper sulphate solution or of nitric acid? 
Try again with clean terminals, fresh saturated copper sulphate 
solution, nitric acid of the specified strength, clean zinc rods, 
etc. Is the range of the results reduced? Discussion of the 
results is not merely a statistical exercise; it can teach much 
about laboratory physics. 

As a chemical experiment consider the determination of the 
equivalent weight of copper. This is usually done by at least 
two methods, (a) by’ oxidising a known weight of copper to 
copper nitrate in a weighed test-tube and converting the nitrate 
to cupric oxide, (b) by reducing cupric oxide to copper in & 
stream of coal gas. Comparison of the means and ranges of the 


18 TEACHING OF STATISTICS IN SCHOOLS 


two sets of results can show which method is the better and 
a discussion may explain why. To make the discussion realistic 
it must be conducted by the teacher who was in charge of the 
laboratory experiments—only he has the detailed background 
knowledge which is essential in interpreting the results. 

In a simple course of human biology each member of the 
class can measure another member’s ‘vital lung capacity’ (i.e. 
the volume of air breathed out during a deep exhalation). 
From these data the average capacity of all members of the 
class can be computed. Then for each member a series of five 
measurements at intervals can be taken. From these sets of 
data an average for each individual can be found. Finally, a 
series of measurements in quick succession can be made on one 
member of the class. As he tires his lung capacity becomes 
progressively smaller; the danger of accepting an average where 
its use disguises a trend can be demonstrated. 

From such discussions some important statistical principles 
should begin to emerge: 

(a) That any individual experimental result is usually subject 
to small random errors and should be regarded not as the result 
but as an estimate of the result. 

(b) That the mean of a number of results (when the mean 
is appropriate) is still only an estimate, though one in which 
more confidence can be placed than in any individual result. 

(c) That still more confidence can be placed in the mean 
result if, by careful technique, the spread of the individual 
results can be reduced. 

(d) That differences between individual results, or between 
the mean and the correct result, are only to be expected; that 
they are not necessarily ‘real’ or ‘significant’, but arise by 
chance. 

(e) That some differences are real or significant and must 
be explained in terms of some defect of technique or of some 
weakness in the basic assumptions (e.g. that the salt and sand 
were thoroughly mixed when in fact they were not). 

A difficulty arises here—how is it possible for the teacher 
to decide whether a difference is ‘significant’ or not without 


AVERAGES 19 


an elaborate statistical computation? A rough working rule 
for the teacher would be useful. If we assume that the dis- 
tribution of results is approximately Normal with standard 
deviation o it is known that the mean range, 7, of samples 
of 4 results tends to the value 2-060 as the number of 
samples increases. By noting the ranges of the results taken 
in sets of 4 as they are worked out an estimate of g is obtained 
by halving the mean range. If m is the calculated mean result 
and F the expected result, then the difference d = | E — m | 
is ‘significant’, i.e. is highly improbable as a matter of chance if 


d 
—— > 4, 
W/V 4 


For any individual result a deviation from the mean which is 
greater than 3r/2 is also highly improbable as a chance result. 
Tt must be emphasised that these rules are only rough approxi- 
mations intended to help the teacher and not to be given to 
the class: for a more elaborate analysis, which is necessary 
only as a matter of interest and as an occasional check, tables 
of the Normal probability function and of the t-test will be 
required. 

For the more systematic computation of the average it is 


necessary to treat it not only as an arithmetical exercise in 


summation and division, but also as an exercise in‘ algebra. 
There is nothing difficult in the use of X as shorthand for ‘the 
sum of all things like . . .” and it can be introduced much 
earlier than with the summation of finite series, as is generally 
the case. Once the use of È has been explained by the use of 


some simple numerical examples, the arithmetic mean of t, 
lon , 
. x, can be defined as ¢ = > E, Simple 


or d>r 


fg, sy 6 4 ps 


extensions of this formula which are useful in the rapid com- 
, Puting of averages, such as those required for the discussion of 
experimental results obtained from the laboratory, are: 


(a) If x, = a + Y, ete., where a is a constant, then 


20 TEACHING OF STATISTICS IN SCHOOLS 


lon 1 
t=T dy =z% (a +y) 
=- EEA 
1 
ee 
n 
=a+=2) Yr A 


e.g. the mean of 981-6, 980-5, 980-7 and 982-3 is equal to 
980 + (the mean of 1-6, 0-5, 0-7 and 2-3). 
(b) Ifa, = c.y, etc., where c is a constant, then 


1 1 n 
t= yt z, = za C.Y, 
Cus 
= A 
e.g. the mean of 925, 825, 875 and 975 is equal to 25:x (the 
mean of 37, 33, 35, and 39). This result is useful to introduce 
before frequency distributions are analysed. 

(c) If z is the mean of a set of n, numbers, and @, is the 


mean of a set of n, numbers, then the mean of the set obtained 
by combining the two sets of n, and n, numbers is given by 
Nk, + Naa 
My + Ne 

(d) The above formula has many and diverse applications. 

Expressed in the more general form as 
t =D w, „2/5 W, 

it is ARA as the weighted mean. The ‘weights’, w,, wa Wa 

. . Wp or their ratios, are usually based on statistical data 
but, as ee the ‘weighting’ of examination results, they may be 
arbitrarily assigned. In mechanics it is used in problems involv- 
ing parallel forces or the dynamics of a number of particles; 
in co-ordinate geometry one of its uses is to give the co-ordin- 
ates of a point which divides a line in a given ratio; in economic 


Acc. No... a 


ie AVERAGES 21 
statistics it is of importance in constructing compound index 
numbers. Because of its wide applicability it deserves greater 
attention in elementary mathematics than it receives at 
present. 

Some examples: 

1. The average wages of men and women in an industry are 
107s. and 92s. per week respectively. The ratio of men to 
women employed in the industry is 3 : 2. What is the average 
wage for all workers? 

2. Iron is a mixture of two isotopes of atomic weight 54 
and 56. The atomic weight of the mixture is 55-84. What is 
the composition of iron in terms of its isotopes? 

3. 500 c.c. of a 2N solution of hydrochloric acid are added 
in error to a bottle containing 7500 c.c. of a 0-1N solution of 
the same acid. What is the strength of the mixture? 

4, Metallic ores containing 25%, 21%, and 18% of the metal 
are used in the ratios 3 : 5: 2 respectively. What is the content 
of the metal in the mixture? 

5. In a certain industry the average weekly wages of men, 
women and juveniles are 112s., 85s. and 48s. and the ratios of 
the numbers of each employed are 6 : 3 : 1 respectively. What 
is the average weekly wage for all the workers of the industry? 

6. The average humus content of a sandy soil is 11-2%, and 
that of a peaty soil is 53:8%. In what proportions should the 
two soils be mixed to provide a soil with a humus content of 
254%? 5 

A further property of the mean that is important is that the 
algebraic sum of the deviations from the mean is zero, i.e. 
(z — x) = 0. Ifelementary mechanics is taught itis instruc- 
tive to find the mean of a few numbers practically by loading 
a light beam and finding the point of balance, e.g. to find the 
mean of 66, 92, 5, 12, 70 and 55, suspend 1 Ib. weights at 66 cm., 
92 cm., etc., from one end. The point of balance inthis example 
is at 50 cm., so that the weight of the beam does not affect the 
result, but small deviations of the mean from the centre of the 
beam will not seriously affect the result. The’ metre rule can 
also be used to demonstrate the weighted mean. 


T. W, 3 
fE TON Accessioned Noss... 


o 


22 TEACHING OF STATISTICS IN SCHOOLS 


6. INDEX NUMBERS 

Though there are many practical difficulties in the computa- 
tion of index numbers the ideas on which they are based, 
percentages and weighted means, are well within the grasp of 
14-15-year-old children. The simplest type of index number 
is that which measures the price fluctuation of a single com- 
modity. The price of a commodity is noted over a period and 
a norm is arbitrarily selected; it may for example be the price 
of the commodity at a given date, or its average price over a 
comparatively stable period. The price is then used as a stan- 
dard against which later prices are compared; the later prices 
are expressed as percentages of the norm. Changes in the index 
number are easier to appreciate than changes in the cash prices 
of the commodity, and the index numbers of different com-* 
modities are easier to compare than the cash prices of those 
commodities. Thus for strawberries sold in England and Wales 
and with the average price for 1936/38 as 100: 


Year o . 1936 1937 1938 1939 1940 1941 
Price per lb. 9d. 9d. 104d. 94d. | 1s. 03d.) ls. 73d. 
Index. . 95 95 109 98 135 210 
Year ee! ve e|, MOE? 1943 1944 1945 1946 
Price per Ib. . . | 1s. 3d. | 1s. 34d.| 1s. 23d.) 1s. 4d. | 1s. 24d. 
EE f4 Gx, E 159 165 156 170 154 


Similar simple index numbers for other fruits are made and 
from these data a composite index number for all fruits is 
computed as a weighted mean. In the same way composite 
index numbers for vegetables and glasshouse products, for 
livestock and livestock products, and for cereals and farm 
crops are computed from their several items. The mean of 
the composite index numbers, suitably weighted, yields the 
‘agricultural price index’. 

The calculation of a few composite index numbers from the 
simple index numbers provides good exercises in the use of the 
weighted mean. The Monthly Digest of Statistics (H.M.S.O. 


MEASURE OF DISPERSION 23 


2s. 6d.) and the Monthly Bulletin of Statistics published by the 
Statistical Office of the United Nations (H.M.S.O. 2s. 6d.) and 
their occasional supplementary volumes provide much suitable 
data of topical interest and information about index numbers. 

At the same time the exercises provide opportunities of men- 
tioning some of the uses and abuses of index numbers and of 
removing some misconceptions. A discussion of cost-of-living 
index numbers is worth while, if the following points are 
emphasised: 

(a) Cost-of-living index numbers of different countries are 
computed in different ways to serve different purposes and are 
not strictly comparable. 

(b) They do not indicate whether it costs more to live in one 
country than in another. 

(c) They need to be modified from time to time to reflect 
changes of taste, habits and conditions (as was done in June 
1947 in the United Kingdom). 

(d) They are averages based usually on samples of city 
working-class expenditure; large deviations may occur in 
individual cases and from one region of a country to another. 

For more advanced mathematical work two additional items 
might be considered: 

(a) The weighted geometric mean as an index; its advantages 
and its computation. 

(b) The change in a composite index number caused by small 
changes in its components. 


7. MEASURE or DISPERSION 


The introduction at this stage of any measure of dispersion 
other than the range is hardly justifiable, unless time is avail- 
able for a first study of frequency distributions. The danger of 
introducing the standard deviation (or even the mean deviation 
from the mean) too early is that it may be learnt (and taught) 
merely as a trick of technique. For many classroom purposes 
the simple range is adequate, but its main defect as a measure 
of dispersion is that it depends directly upon only two values 
of the variable. A more efficient way of using it is to find the 


24 TEACHING OF STATISTICS IN SCHOOLS 


mean range of a number of small samples of equal size as has 
already been described (p. 19 above). The samples should be 
selected in some random way from the data available, and, if 
possible, at least 6 samples should be taken. As the mean 
range of the sample varies with the sample size it is important 
that equal sample sizes should be used when the two disper- 
sions are being compared. As an alternative (as an aid to the 
teacher but not for classroom use) the factors of Table 2 may 
be helpful; a rough estimate of the standard deviation of the 
distribution is given by multiplying the mean sample range by 
the given factor. 
TABLE 2 P 
Factors for converting mean ranges to rough estimates of the 


standard deviation 


Sample Size . 
Factor . 


Ar 


5, 6,7 | 8, 9, 10 
0-4 0-33 


8. FREQUENCY DISTRIBUTIONS 
When we are confronted by a large number of observational 

results it may be difficult to grasp the main features of the , 
data, that is, to ‘see the wood for the trees’. A frequency dis- 
tribution is an orderly arrangement of the data. For example, 
a chemistry master suspected that his class of 14-year-old boys 
were being careless in their readings of the burettes they were 
using in an experiment and the use of which had already been 
explained to them. To check their accuracy several burettes 
were set up with different levels of liquid and the boys were 
asked to move from one burette to the next, noting and record- 
ing their readings. The following is a record of the readings set 
down by 25 boys for a burette which, according to the chemis- 
try master, had a correct reading of 23-31 c.c.: 

23-35 23-35 23-30 23-30 23-35 

23-30 23-30 23-40 23-35 23-32 

23-38 23-30 23-35 23-30 23-30 

23-35 23-32 23-40 23-35 23-30 

23-35 23-32 23-35 23-30 23-30 


i 


ee a a ee ee 


te StS 


; 


FREQUENCY DISTRIBUTIONS 25° 


At a glance this table of results conveys very little apart 
from the fact that the readings are unequal. To see more 
clearly what is happening it helps to arrange the results as- a 
‘frequency distribution’: 


23-25 
2 


23-38 
1 


23-40 
2 


23-30 
10 


23-32 
3 


Total 
25 


Reading 


23-35 
No. of boys 7 


With the help of a diagram of the liquid meniscus in the burette 
tube it was possible to show both the faults of individuals and 
the faults of the class as a whole. The main mistakes can be 
seen to be: 

(a) Insufficient care in positioning the eye. 

(b) Rounding off to the nearest unit or half unit. 

(c) Possible miscounting of the unnumbered divisions of 
0-1 cc. 

Such informal introductions to frequency distributions, 
especially if they are related to some work of the class, will 
simultaneously explain what they are and how they are used, 
before they are considered in a more formal way. 

The next step is to introduce frequency distributions of 
grouped data and their representation by histograms. For this 
Purpose examination marks on the scale 0-100 are suitable, 
but at least 100 marks should be available. The marks are 
read out and plotted, without grouping, one by one. The result- 
ing chart will show the features of the distribution in such 
detail that it is not easy to describe it simply in a general way. 
A less detailed chart might be more effective—so let us group 
the marks in equal class intervals of 5 marks, 1-5, 6-10, ... 
96-100. (If any 0’s are to be accounted for, it is better for the 
Present to include them in the class 1-5 to avoid unequal class 
intervals at this stage.) The results are tabulated: 


Mark Frequency 
1 Bie: a te EE 
6-10 E : . 3 . 7 

11-16 a a $ : < B2 

ete. 


26 TEACHING OF STATISTICS IN SCHOOLS 


and the ‘histogram’ is drawn, as though it were a bar-chart, 
with columns of height proportional to the corresponding fre- 
quencies as in Fig. 5. There is no particular reason for selecting 
a class interval of 5 marks, and as it is interesting to compare 
the histograms obtained by using different class intervals, the 
members of the class should be allowed some choice from 2, 4, 
5, 10, 20 and 25 marks, which all give sets of equal class inter- 
vals for the range 1-100. Discussion of the results should show 
that a total of about 20 classes displays the results in sufficient 
detail for most purposes: if the class intervals are too large, 
important features may be masked. 


20 


15 


Frequency 


„ Marks 
Fia. 5. Histogram of distribution of marks, 


The next step is to emphasise the characteristic area-property 
of the histogram by the representation of frequency distribu- 
tions which have unequal class intervals. The principle is that 
the areas of the columns must be proportional to the cor- 
responding frequencies; the heights of the columns are propor- 
tional to the frequencies only for the special case of equal class 
intervals. Suitable data can be found in the Annual Abstract 
and can be chosen to illustrate histograms of different shapes, 
e.g. V-shaped, J-shaped and skew. This area-property of histo- 
grams is of great importance for later work in sampling and 
probability theory. 

An example with the calculations is set out in Table 3. The 


4 


FREQUENCY DISTRIBUTIONS . 27 


data illustrate some striking changes which are best brought 
out by drawing the two histograms on the same diagram. A 
difficulty arises about the last interval which is ‘open’; unless 
more facts are given only a judicious guess can be made 
about it. ` 

TABLE 3 


Deaths in England and Wales; a comparison of age at death 
in 1871 and 1946 


Column dimensions 
Age 1871 1946 | Heights 
Base 
1871 | 1946 
0-1 126 33 1 126 33 
1-2 41 2 1 4l 24 
2-4 + å 40 3 3 13-3 1-0 
5-14 + # 3l 5 10 3-1 0-5 
15-24 + 30 8 10 3-0 0-8 
25-34 + 34 13 10 3-4 13 
35-44 + 33 20 10 3:3 2-0 
45-54 + 35 39 10 3-5 3-9 
55-64 + 42 76 10 4-2 7:6 
65-74 + 48 ~ 127 10 48 12-7 
75 + 49 162) oie at R ? 
Ī— 


A rather more sophisticated exercise is to draw the histogram 
of a skew distribution, first on a linear base and then on a 
logarithmic base or, alternatively, on semi-logarithmic paper. 
Data suitable for this exercise will be found in the Annual 
Abstract, e.g. distribution of incomes liable to surtax. 

The computation of the mean of a frequency distribution is 
shown in Table 4. This tabulated computation is easy to teach 
as a technical trick; it should be omitted if it is unlikely that ` 
its underlying principles will be understood. At this stage only 
distributions with equal class intervals and not more than 10 
classes should be used. For this tabulated work squared 
paper (}-inch) is very helpful in keeping the work tidily 
arranged. P : 


. 


28 TEACHING OF STATISTICS IN SCHOOLS 


TABLE 4 ; 
Computation of the Arithmetic Mean 


Dorion from 
working mean 
Mark | Frequonoy | Srorvaki fxd 
(f) (x) Class 
Marks 3 intervals . 
(a) 

1-10 3 5} — 40 za =o 
11-20 8 15} — 30 3s — 24 
21-30 12 254 — 20 e — 24 
31-40 15 355° —10 =r 15 
41-50 16 45} 02 0 = 5 
51-60 420 "| 55} +10 +1 + 20 
61-70 12 654 +20 2 +24 

“71-80 9 15} + 30 +3 +27. 
81-90 | 4 854 +40 +4 +16 
a| 91-100 1 95h + 50 +5 ear 
100 y ; + 92 

‘ Total + 174 


Nores (see corresponding reference numbers in Table 4). 

1. If this column is omitted in the first examples many 
mistakes arise. The entries in this column stress the fact that 
by grouping into classes we are placing all the marks of each 
class at the centre of that class. It is a useful exercise to show 
by examples that this approximation leads to negligible error 
if the curve is roughly symmetrical. 

2. The working mean is chosen as a guess at the centre of 
the class which appears most likely to contain the mean. If 
the guess is wrong it does not matter, but the more nearly 
correct it is the less computing work there is to do. 

3. This column can also be omitted after the first few exer- 
cises. It acts as a reminder that the units of the computation 
are being changed for the last two columns. 


FREQUENCY DISTRIBUTIONS 29 
4, The mean is given algebraically by 


x fd 
CO 
ae: 
where æ = working mean 
and C = class interval 


The expression X fd/X f should be recognised as a weighted 
mean in which the weights are the ‘frequencies’ of the second 
column. n 

The computation is completed thus: 


17 
Mean = working mean E ioo class intervals 


100 
F » a =F Too * 20 marks 
= (45-5 + 1-7) marks è 


= 47-2 marks. 
The first few exercises should be arranged so that some give 
exact values of the mean; it is then possible and important to 
verify that ‘the sum of the deviations from the mean is zero’ 
exactly, i.e. that 
Ufe— z) = 0; 

As a measure of dispersion the standard deviation is to be 
preferred, but defined as ‘the square root of the mean of the 
squared deviations from the mean’ it sounds rather formidable 
to 15-year-old children. Though its computation requires the 
addition of only one more column to those of Table 4, there is 
no point in using it unless it can be thoroughly comprehended. 
For this stage therefore the mean deviation from the mean is 
probably the simplest measure to use. Unless the error be- 
tween the working mean and the true mean is greater than half 
a class interval the mean deviation can be found with sufficient 
accuracy directly from the tabulated computation of the mean. 

Thus for the data of Table 4: ` 

Mean deviation = (75 + 92)/100 
= 1-67 class intervals 
= 16-7 marks. 


30 TEACHING OF STATISTICS IN SCHOOLS 


Neither the computation of the mean nor of a measure 
of dispersion is an end in itself, except possibly for the one 
or two examples required to illustrate the technique of com- 
puting them; they should be computed only when they are to 
be used in comparing the properties of two or more frequency 
distributions. 


9. CoRRELATION 

Almost all graphs drawn in school are those of simple mathe- 
matical functions or of experimental results which can be 
expected to give good straight lines or simple smooth curves. 
It is therefore instructive to consider some variables which are 
not so exactly related, e.g. the heights and weights of the 
members of the class, or their marks in, say, mathematics and 
physics. If such bivariate distributions are plotted on graph 
paper, rather dispersed but not entirely featureless patterns of 
points will be obtained. Though it may not be possible to 
draw at sight one straight line which can adequately represent 
the association between the two variables, yet it may be pos- 
sible to say that, on the whole, high and low values of one 
variable are frequently combined with high and low values 


respectively of the other. Such distributions are of great . 


importance in statistics; a simplified analysis by graphical 
methods as described below is possible for 15-year-old children. 

When two variables x and y are plotted on a graph in the 
usual way, one of many possible patterns will result, though it 
is possible to arrange the patterns into a few typical groups. 
Six types of pattern are shown in Fig. 6. The first, Fig. 6 (a), is 
the graph of an algebraic relation of the form y = mx + c 
where m and c are constants. Every pair of values of x and y 
lies. exactly on the straight line. Lines as exact as this are 
mathematical ideals which, though important in geometry, are 
never realised in plotting practical measurements. Fig. 6 (6) 
illustrates the plotting of results obtained in careful experi- 
ments when some linear relation between two physical vari- 
ables is being investigated. The plotted points lie in a narrow 
- band about an ideal straight line which is usually drawn ‘by 


Fic. 6. Six types of two-variate association:—(a) Exact linear (mathe- 
matical); (b) Linear association (e.g. practical physics); (c) Linear 
association (e.g. practical biology); (¢) No association; (e) Curvilinear 
association; (f) Negative association. À, 


ar, ye 


32 TEACHING OF STATISTICS IN SCHOOLS 


eye’. The plotting of pairs of biological measurements, e.g. the 
age and height of the members of the class, gives a greater 
scatter of points. The pattern shown in Fig. 6 (c) is of this 
type; though the points are widely scattered it can be said that, 
roughly speaking, high values of x are associated with high 
values of y, and low with low. In such a case there is said to be 
some linear association between x and y. As the degree of 
association increases the pattern becomes more like that of 
Fig. 6 (b): as the degree of association decreases the pattern 
becomes more like that of Fig. 6 (d) in which there is no dis- 
cernible association between v and y; any selected value of 2 
(or y) may equally well be found with any value of y (or x), 
high, medium or low. The relation between two closely related 
variables may not always approximate to the straight line 
form, however. In Fig. 6 (e) the plotted points lie closely 
about a definite curve. In such a case it is often possible and 
useful to ‘transform’ this curve into a straight line by plotting 
suitable simple functions of x and y. Fig. 6 (f) shows a form of 
linear association in which high values of y are found with low 
values of x and vice versa. This kind of association is said to 
be negative to contrast it with the positive associations illus- 
trated by Figs. 6 (b) and 6 (6). 

Though it is easy to distinguish the main types of scatter 
diagrams described above, it would be useful to have some 
numerical measure which could describe these different types 
in a simple way. As the linear type is the most important in 
practical affairs we need concern ourselves with this type only. 
Our problem is thus to find some way of assigning a number 
which can be used to measure the degree and sign (positive or 
negative) of the linear association of two variables. A simple 
coefficient used by statisticians is the coefficient of correlation. 
The coefficient is equal to + 1 for exact positive linear associa- 
tion; it is equal to — 1 for exact negative linear association; 
and it is zero if there is no association at all. For intermediate 
degrees of linear association it can take any value between + 1 
and 0, or between 0 and — 1. This coefficient is defined in 
mathematical terms. Its calculation however is rather labori- 


J 


CORRELATION 33 


ous and for many cases in which it is useful an approximate 


' value of it can be obtained graphically. 


Assume that we have a set of corresponding mathematics 
and physics marks available, both sets in the scale 0-100. The 
graphical procedure is: 

(a) Plot the pairs of marks on graph paper. 


100 


fe) 20 40 60 80 100 


Fic. 7. Regression lines as loci of means of arrays. 


(b) Find the means of the mathematics marks, (), and of 
the physics marks, (g). Plot the point (z, 7) (Fig. 7). 

(c) Consider the marks in the class intervals 1-10, 11-20, 
21-30, etc. For the mathematics marks 11-20, for example, 
there is a distribution of physics marks; plot the mean of this 
distribution by a cross on the line corresponding to the mathe- 
matics mark 154. The mean may be calculated but it is usually 
possible to estimate it sufficiently accurately by eye. Repeat 
this plotting of the means for all the columns (Fig. 7). 


t 
1 


34 TEACHING OF STATISTICS IN SCHOOLS 


(d) Draw through (Z, 7) a straight line passing as nearly as 
possible ‘through’ the means of the columns. This line is called 
the line of regression of y (physics marks) on x (mathematics 
marks) (Fig. 7). 

(e) Similarly for each row (1 — 10, 11 — 20, 21 — 30, etc., of 
physics marks) plot on its centre line the mean of the distribu- 
tion of mathematics marks in that row. Use a small circle to 
indicate these means. 

(f) Through (#, J) draw a line passing as nearly as possible 
‘through’ the means of the rows. This line is the line of regres- 
sion of @ (mathematics marks) on g (physics marks). 

(g) Find the slopes 6,, 6, of the two regression lines. From 
these slopes an approximate value of the coefficient of correla- 
tion, r, can be found. It is given by 


i v tan 6,, 
" tan 6, 
for reasons explained later. 

If the trend is for the two variables to increase together both 
the tan 0, and tan 6, are positive and the positive value of the 
root is taken; if one variable decreases while the other increases 
both slopes are negative and a minus sign is assigned to r. It 
will be found that values of r lie between + 1 and — 1. The 
smaller the angle between the two regression lines (for equal 
z- and y-scales) the more closely are the variables correlated, 
positively or negatively; if r is zero, or approximately so, the 
variables are said to be uncorrelated, and the angle between 
the two regression lines is almost a right angle. 

When interpreting the correlation coefficient it must be 
remembered that a high value of r does not imply that there is 
necessarily a causal relation between the two variables; a low 
value of r will certainly be obtained if the variables are com- 
pletely independent, but it may also arise if the variables are 
related in some way other than linearly, e.g. parabolically. 

As the graphical method described above is applicable only 
to large samples 3 e.g. at least 100 pairs) sets of marks are likely 
to provide the most interesting ready-made data for class use. 


CORRELATION 35 


Questions about the significance of the correlation coefficient 
_must be deferred, and it would be wise to use in exercises only 
data for which |r | > 0-3. (The 5% significance level of r for 
100 pairs is 0-2 approximately.) 

Children are always surprised to find that the graphical 
method described produces two lines instead of one, and some 
explanation of this phenomenon will be demanded. A bivariate 
distribution can usually be considered from two points of view, 
what we may call the ‘x’ and the ‘y’ points of view. The 
mathematics master considering the pairs of marks used in 
our example will have the x point of view; he will ask himself 
‘How do the physics marks compare with the mathematics 
marks? In other words, he will use the mathematics marks as 
the basis of comparison and consider how the physics marks 
vary for given marks in mathematics. The line of regression of 
y on x summarises the distribution for him. The physics master, 
on the other hand, will consider the marks from the y point of 
view: ‘How. do the mathematics marks compare with the 
physics marks?’ For him the line of regression of « on y sum- 
marises the distribution. The two lines are therefore an expres- 
sion of the fact that there are at least two different ways of 
considering the marks. 

The coefficient of correlation is a means of combining these 
two points of view into one expression about which both 
masters can agree; it is the geometric mean of the slopes of the 
two lines of regression though it may not appear to be so in our 
formula. The reason is that the line of regression of y on « is 
usually expressed in the form y = az + ¢ so that a = tan 6a 
but the line of regression of x on y is expressed in the form 
y = bx + d so that the slope b must be measured from the ‘y’ 
point of view. Hence d 

b = tan (90° — 0,) = cot 0, = 1/tan 6, 


r Vab = Vian 0,/tan 0, 


The kind of data readily available in schools which is suit- 
able for exercises in correlation depends very much on the kind 
of work done in the school. If the school has well-equipped 


36 TEACHING OF STATISTICS IN SCHOOLS 


laboratories, or a school garden with an experimental plot, or a 
meteorological station, almost every class experiment provides 
some data worth a closer analysis than it is usually given. It 
is not suggested that more weighing and measuring be done 
(there is perhaps too much of this already) but that the results 
of some class experiments should be analysed as a whole rather 
than be scattered and lost as 30 or 40 isolated results separ- 
ately recorded in individual notebooks. For example, in the 
elementary chemistry experiments in which a weighed quan- 
tity of a substance loses weight on heating, the class results can 
be plotted on graph paper, weight heated (a) against weight 
remaining (y), so long as the weight heated has a fair range of 
values. By this means the teacher of chemistry avoids some 
otherwise inevitable teaching of arithmetic and yet gains more 
than he loses in trying to explain the quantitative facts, while 
each pupil benefits by sharing his results with others. An 
exactly similar procedure may be followed in practical biology 
when the class is estimating the ‘loosely-held water’ (i.e. the 
percentage loss of weight on air-drying) in samples of the same 
soil specimen. 

Where laboratory facilities are limited, the weight of objects 
up to 1 lb. (or 500 gm.) can be quickly determined to within 
ł oz. (or 10 gm.) by using a spring or lever letter balance. The 
measurement of lengths up to 12 inches (or 30 cm.) to the 
nearest 0-1 inch (or 1 mm.) should afford no difficulties. Many 
easy quantitative experiments on rate of growth, effect of fer- 
tilisers and hormones, germination, etc., requiring only simple 
apparatus, are described in Simple Experiments in Biology by 
Cyril Bibby (Heinemann, 8s. 6d.).* 

An occasional spurious correlation can be both amusing and 
instructive. 


10. PROBABILITY AND SIMPLE PROBABILITY CALCULATIONS 
It has been argued that the subject of probability is unfit to 
be mentioned in school because it is still beset by logical and 


* See also H. Kalmus, Simple Experiments with Insects (Heinemann, 
7s. 6d.). 


PROBABILITY CALCULATIONS -37 


philosophical difficulties. If this argument were generally 
applied to the subject matter of the school curriculum very 
little of that curriculum would remain. Primary school courses 
of arithmetic do not begin with a study of Principia Mathe- 
matica, nor is it necessary to begin the study of probability with 
a critical examination of its possible definitions. Education at 
the school age rests mainly on the child’s intuitive acceptance 
of ideas rather than on their logical necessity. If it is accepted 
that a more critical approach can be made only when the fine 
logical distinctions are appreciated, there is no reason why an 
introduction to probability should not be given at a much 
earlier age than it is at present. If the subject of probability is 
taught in schools at all it is usually as an unattached addendum 
to the Sixth Form algebra course for mathematical specialists, 
and then it is restricted to a brief treatment of the classical 
theory. In the suggested treatment of probability outlined 
below an attempt is made to relate the child’s experience of 
everyday affairs to a subject which has many accepted prac- 
tical applications. 

Why do captains of football and cricket depend on the toss 
of a coin to decide who shall play the first half with the wind 
or take first innings on the batsman’s wicket? It seems to be 
generally accepted as the fairest way of beginning a game. But 
what do we mean by ‘fair’? Let us look at the coin more 
closely. Would it be fair to use a coin with two heads? or one 
that is bevelled or weighted to give heads more often than 
tails? No, we want to give each side an equal chance. What, 
then, do we expect of the coin? What are we taking for granted 
when we use a coin to toss for first innings? We aré assuming 
that, if the coin were tossed a large number of times, it would 
give about as many heads as tails. If we tossed a penny 
200 times would we expect it to give exactly 100 heads and 
100 tails? No, it might happen to do so but we should not 
suspect the penny of being unfair or biassed if it gave 101 
heads, or 105 heads, or 110 heads—but here we begin to doubt. 
(It is not necessary to decide where the line should be drawn, 
so long as the points are established that the number of heads 


38 TEACHING OF STATISTICS IN SCHOOLS 


and tails need not be exactly equal in a finite number of throws, 
and that we do not expect an exact alternation of heads and 
tails in successive throws.) p 

It is now time to express in mathematical terms the fact 
that a fair coin will give, ‘in the long run’, about as many heads 
as tails. We say, mathematically, of an event that is impos- 
sible that its probability is 0, of an event that is certain that 
its probability is 1. What can we say about the probability of 
getting a head or a tail with a fair coin in a fair throw? The 
probability of getting one or the other is l, because a coin 
dropped on a flat floor will fall with one side uppermost; the 
probability of getting neither is 0, because a penny thrown into 
the air does not remain spinning in the air indefinitely. We 
have to say that the probabilities of a head or of a tail are 
equal and that together they total 1; they must each be +. 
If the coin were biassed so that it gave, in the long run, more 


heads than tails, say 6 heads for every 4 tails, we would say _ 


that the probability of a head is 0:6, and of a tail 0-4. 
The joy of playing snakes and ladders, ludo, and other games 
‘with dice is that one never knows what is going to happen 
next; with every throw we put ourselves at the mercy of 
chance. Let us look at a die more closely. It is approximately 


cubical, with its six sides numbered 1 to 6. What do we expect 
of a fair die? If we threw it many times, we should expect it 
to give roughly equal numbers of 1's, 2’s, 3’s, 4’s, 5’s and 6’s, 
i.e. there should be equal chances of getting each of the num- 
bers. As the probability of getting some number in a throw 
is 1, the probability of getting a particular number with a fair 
die is 4. 

Could the two captains use a die instead of a coin to decide 
whose team should bat first ? They need to make its use ag 
simple as possible, i.e. one throw, and yet they need to be fair 
to both captains. If the die is unbiassed, what about the 
number of odd and even numbers thrown in the long run? They 
will be roughly equal and therefore the probability of throwing 
an even number or an odd one is 4. The ‘toss’ could therefore 
be decided by calling odds or evens with a die instead of heads 


PROBABILITY CALCULATIONS 39 


or tails with a coin. Notice that we can get the same result by 
saying that the chance of throwing either a 1, a 3, or a 5 is 
4+4+4=4. What then are the probabilities of throwing 
(a) a 6, (b) any number except 6, (c) a number greater than 4, 
(d) a number less than 5 ? 

If the two captains had neither a coin nor a die could they 
use a pack of cards to decide the ‘toss’? What should we like 
to know about the cards? Is it necessary to have a complete 
pack? etc. What is the probability of drawing from a shuffled 
pack (a) the ace of spades, (b) the two of diamonds, (c) a heart, 
etc.? 

At this stage a class experiment with dice could well be 
undertaken, e.g. the comparison of the results obtained by 
throwing a good die and another die which has been biassed 


(e.g. by filing down one face). Two pupils can alternate in 


throwing the die and announcing the result while the remainder 
of the class individually record the results on prepared graph 
paper. The significance of the phrase ‘in the long run’ becomes 
clearer if, at convenient intervals, the proportionate deviations 
of the observed results from those expected are compared. — 7 

So far only the probability of occurrence of single events 
has been considered. An introductory treatment of compound 
probability, i.e. the probability of occurrence of two or more 
independent events, is possible. ? 

What is the probability of winning the toss on two succes- 
sive occasions? To answer this let us begin by listing all the 
possible results of tossing a coin twice, namely, (H H), (HT), 
(T H), (T T). All four possibilities are equally likely so the 
chance of calling any one of them correctly is }. The proba- 
bility of calling correctly in three successive throws can be 
investigated in the same way but unless the binomial theorem 
is already known it is not possible to generalise the probability 
distributions of heads and tails. The following question on the 
topic is sometimes asked: If a penny has given 100 heads in 
100 tosses what is the probability that the next toss will also 
give a head? The answer usually expected is 4, but a penny 
that behaves in such a way is almost certainly biased. With- 


40 TEACHING OF STATISTICS IN SCHOOLS 


out further information one can only say that the probability 
of another head is much nearer to 1 than to 4. 

The probable results of throwing two dice are interesting and 
to children rather surprising. With one die there is an equal 
chance of throwing any of the numbers from 1 to 6. With two 


Die A 
r 
* in E E 57 F 
¢ 
a | RE ai Ya 
SHES TA toe Gia icy e8 
Sule eB 6) oF gs 9 
Die B 
BETE F FS TOMO) 
DHO: 8 8 8. OL 
eTA 8) D SLO TEEI 


‘ 


Fic. 8. Combination of two dice scores. 


dice we throw any of the numbers from 2 to 12; are they 
equally probable? A chess-board scheme displays the possi- 
bilities in the clearest way (Fig. 8). All the arrangements 
shown are equally likely and therefore the ‘probability distri- 
bution’ of the throws of two dice is: 


Sum of the two 


dice. am. © ES 3 4 5 6 7 
Probability . . | 1/36 | 2/36 | 3/36 | 4/36 | 5/36 | 6/36 
Sum of the two dice . 8 9 10 | 11 | 12 
Probability 5/36 | 4/36 | 3/36 | 2/36 | 1/36 


The probability of getting 7 in a double throw is therefore 
6 times as great as the probability of getting 12. These and 


_——S Pa eee 


PROBABILITY CALCULATIONS 41 


similar results which arise from the drawing of cards from sets 
numbered 1-10 (e.g. playing cards) are better appreciated if 
the expected results are compared with some experimental 
results. In using cards, those that are drawn must be replaced 
and carefully reshuffled between each draw to make the draw- 
ing process exactly analogous to the throwing of dice. Cards 
soon tend to stick in pairs; dice are the easier and quicker to 
use. ; 

There is another aspect of probability which will intrigue 
children who like noting registration numbers of cars and ‘log- 
ging’ numbers of railway engines. Here is a game. One-tenth 
of all registration numbers end in the same digit, let us say 5. 
Would you expect precisely one last digit of 5 in every ten cars? 
No. What would you expect? Ask the children to find out by 
actually counting tens of motor-cars and scoring the number 
in each ten which have the selected final digit. The distribu- 
tion of a large number of such scores will be found (if the 

` observers are honest) to follow a regular pattern, though of 
course not exactly. The regular pattern is given by the 
‘probability distribution’ (which in this case is the. Poisson 
distribution with a mean equal to 1) shown in Table 5. 


TABLE 5 


Probability distribution of car registration numbers ending 
in a given digit in 100 random groups of 10 cars 


Total 


100 


2 
18 | 


5 or more 
0 


1 
37 | 


0 
37 


No. in group 
Frequency . 


3 4 
6 2 


A 

It is usually surprising to children to find that so ‘chancy’ a 
game reduces, in the long run, towards a mathematical 
regularity. © 

A variant of this game of observing the last digit of a car 
Tegistration number is to select a final digit and then count 
the number of cars that pass until the digit recurs. The number 
is scored as 1 if the next car to pass is a ‘success’, as 2 if the 


first car is a ‘failure’ and the second car is a ‘success’, and so 
D 


42 TEACHING OF STATISTICS IN SCHOOLS 


on. The distribution of these scores, in the long run, becomes & 
geometric one. The probabilities of scores of 1, 2, 3, 4, . . . are 
1o Tolo) Iha) Tolo) + - - respectively. It is perhaps 
surprising to find that the most probable score is 1, though the 
average score is, theoretically, 10. 


These simple games effectively illustrate how chance, usually ` 


considered to be the negation of law, is itself in the long run 
constrained by laws. The examples themselves are of course 
trivial and of no practical importance, but they illustrate laws 
of probability that are applied to many practical problems, e.g. 
telephone traffic, radioactive disintegration, airport traffic, 
estimating the dustiness of the air, etc. 

This is perhaps as far as probability can usefully be discussed 
in a first treatment. The course outlined above should be sup- 
plemented by additional numerical questions and by some 
further experimental work to resolve any doubts expressed by 
the class. 


11. SAMPLING 


Sampling is so common a practice that most schoolchildren 

_ will have had some experience of it, though few will have given 

the subject any serious thought. The following notes outline 
a suggested first treatment. 

Why do we take samples of things ? One reason is to save 
the time and energy that would be needed to test the whole 
bulk from which the sample is drawn, e.g. a wholesale dealer 
buying fruit and vegetables by the ton cannot afford to inspect 
every apple or potato he buys—he inspects merely a small part 
of the produce in some, perhaps not very systematic, way. 

, Another reason is that it is sometimes impossible to sample a 
property of an object without damaging or even destroying the 
object, e.g. the greengrocer cannot bite each of his apples to 
see if they are sweet, nor can a manufacturer of electric light 
bulbs test all his lamps to ensure that they have the guaranteed 
length of life. Hence sampling is a common and important way 
of estimating some property of a large bulk, or ‘universe’, by 
testing a small fraction of the bulk. 


SAMPLING 43 


What are we assuming when we take asample? It is clear 
that we generally assume that the sample is fair, or ‘representa- 
tive’, that its properties are typical of the universe from which 
it is drawn. This requirement is not easy to ensure. When a 
sample is not typical or fair in some respect we say that it is 
‘biassed’. It is one of the tasks of statisticians to show how 
bias can be avoided and what risks we take by assuming that 
any sample is a fair one. 

Suppose we wished, for some purpose, to select a sample of 
20 boys from this school; how could they be selected to give a 
fair sample? Would it be fair to select them by going to the 
entrance gate in the morning and counting the first 20 that we 
see? No, because boys often come to school in groups, from the 
same bus or train, from the same district, from the same form, 
and so on; such a selection would be unlikely to give a fair 
sample. Would it be fair to ask one of the masters to name 
20 boys? Probably not; he would tend to name the boys 
known to him because they learn Greek or play in his cricket 
team. No, we need a method which we feel to be independent 
of personal bias and which gives every boy an equal chance of 
being selected. A well-known method of achieving this aim, 
used in lotteries, is to put the names (or numbers corresponding 
to them) of all the people concerned into a hat or drum, mix 
them well together, and then pick out as many names as are 
required. If this process is fairly carried out, each person has 
an equal chance of being selected; such a selection is called a 
‘random’ sample. ù t À è 

There is a serious objection that might be raised here. The 
random sample could give a result which is far from repre- 
sentative, e.g. it is possible, though improbable, that the 
random sample might give us 20 prefects or 20 members of one 
form. We shall return to this point later, but meanwhile this 
Possibility could be avoided if we decide to take one boy from 
each of 20 forms, selecting both the boys, and perhaps 
the forms, in a random way. There are many possible 
variations of this kind and each one has to be judged on its 
merits, 


44 TEACHING OF STATISTICS IN SCHOOLS 


Let us briefly consider various methods of sampling and see 
if we can find any sources of bias in them: 

(a) A sample of milk for test is taken by opening a milk 
bottle selected at random and filling a test-tube from it. 

(b) A ‘public opinion poll’ on the question of the Sunday 
opening of cinemas is conducted by asking, on a Sunday, the 
opinions of passers-by in the street near a cinema which is 
open. 

ro A newspaper editor takes a poll on a political matter by 
discussing it with 100 regular readers. of his newspaper by 
telephone. 

(d) A man samples the potatoes in his garden by digging up 
a root at the end of each of three rows. 

Tt is very difficult to avoid bias in any method involving per- 
sonal selection; we are often unaware that we may be biassed, 
and even if we do realise it that does not mean that bias is 
avoided. As the method of picking names or numbers out of a 
hat is not practicable if the universe to be sampled is very 
large, we need a similar random method if we are to be sure of 
eliminating personal bias. Such a method is provided by the 
use of a prepared table of ‘random numbers’. Though tables of 
random numbers for this purpose have been published, it is not 
difficult, to make a small set for our own use. We could, for 
example, draw cards one by one from a pack of playing cards, 
ignoring or discarding the ‘court’ cards, counting 10’s as 0’s, 
and shuffling the pack before each draw. A quicker method 
is to usea small 10-sided. metal spinning top. In either 
method the probability of any particular digit being selected 
is 0-1. The set shown in Table 6 was obtained by the first 
method. ‘ 

To pick a random sample of 8 boys from a class of 35 each 
boy is given one of the numbers from 1 to 35 in any convenient 
way. The table is then consulted. It can be entered at any 
point and the numbers can be read in any direction. For our 
present purpose the quickest method would be to read along 
the top line, mentally reading the numbers in pairs, thus: 37, 
45, 09, 34, 36. Note any that are 35 or less, rejecting the others, 


SAMPLING 45 


until we have eight numbers, e.g. 9, 34, 8, 22, 14, 31, 2. (re- 

peated), 16. In this case the repeated 2 is ignored if we wanted 

8 different boys, but as repeated numbers are always possible 
A 


TABLE 6 
Random numbers 


8.7 E E E AE EaR 
Qas: 2 ots Bey o S oi S 
1 408 8 5 8.9 9°34 
: 6 5 bf 9 30 2 8 5 
ee el ae a G 
S48 Od) eee 68 
SB. O S AT biG) ote 
BB) 6, (46a i616 
#6 28 8 3. 46 73 a 
8 6-7 5 I 7°3 3 195 


the procedure to be adopted if they do occur should be decided 
beforehand. 

It would be disappointing if no boy objected to the use of 
Table 6 (or its equivalent made in class) on the grounds that 
it contains too many 3’s or too few 2’s. The frequencies of the 


digits occurring in the Table are:-— m 
s S = 
Digit .lo|1ļ2|3|4|546]7]|8]|9 | Tota 
Frequency . . | 6 |8| 5 |15|10/14/12) 7,|12|11| 100 
p 


z 7 
The objection cannot be upheld, at least in this example. The 
table is in fact our first random sample. The method of making 
it is analogous to the usual method of drawing names from a 

at, except that the selected names are always returned after 
being noted. The ‘hat’ therefore never empties however long 
we continue drawing. There is therefore no reason to expect 
exactly equal frequencies of digits after any specified number 
of drawings, though we should expect the ratios of the fre- 
quencies to become more and more nearly equal to 1 if the 
Process were continued indefinitely. Most of the remaining 


a 
46 TEACHING OF STATISTICS IN SCHOOLS 


55 
doubts about the unequal frequencies can be dispelled by con- 
sidering the frequencies of another 100 drawings, e.g. the 
frequencies produced by the same pack of cards in another set 
of 100 random numbers were: : 


i 


Digit 
Frequency . 


2 
10 


3 
1l 


6 
7 


7 
11 


Total 
100 


0 
9 


5 
8 


9 
15 


8 
8 


1 
5 16 


So much for a technique of random sampling. The next 
questions are: What can we deduce from a sample? To what 
extent can we rely on a random sample giving us a fair repre- 
sentation of the bulk? In part, the answers to these questions 
can be found experimentally. Usually we take a sample 
because it is difficult or impossible to examine the whole 
universe from which it is drawn. As an experiment we are 
going to reverse the process; we are going to make up a uni- 
verse of a very simple kind and are then going to sample it 
and see what we get. (The apparatus required is a bag of 
marbles, or of beans, or of counters, in two colours. There are 
several varieties of French beans of about the same size and 
shape but of different colours.) It is assumed that black and 
white beans are used and that they are contained in a deep 
bag into which a boy can put both hands. 

Here are two heaps of beans, 800 black and 200 white. If the 
two heaps are put together in the bag and are thoroughly 
mixed we know, that the percentage of white beans is exactly 
20. We are now going to see what happens when we take 
random samples from the bag. (Boys are invited to count out 
samples of 5, keeping both hands in the bag and counting 
from one to the other. The sample is brought out and the 
number of white beans is noted. The sample is then returned 
to the bag. If the sample results are recorded consecutively 
they can later be grouped in consecutive pairs, ete., to see the 
effect of taking larger samples. “ 

The results of 100 (or more) samples are then expressed as a 
frequency distribution. The expected results, given approxi- 
mately by the binomial distribution, can be calculated (as a 


ae 


SAMPLING 47 


check only) from the expansion of 100(0:8 + 0-2)°. Expected 
and observed results are shown in Table.7. Some boys will still 
be surprised to find that the samples of 5 donot always contain 
precisely 1 white bean. In fact, the correct proportion of beans 
was obtained only 46 times in 100, so that for this universe the 
probability of getting the correct proportion in one sample is 
about 0-46 or less than 4. A single sample of 5 would therefore 


Correct value Whole Universe 


Frequencies 


O% 20 40 60 80 100% 
2 
o lu Samples of 100 
% 20 40 60 _ 60 100% 
5 
Samples of 20 
50 
20 
Samples of 5 
(0) 
o% 20 40 -60 80 100% 


Fic. 9. Effect of increasing sample size on estimates of a proportion. 


often mislead us, and mislead us seriously since the probability 
of finding no white bean at all in a single sample is about $. A 
single sample of 5 is therefore not satisfactory for this par- 
ticular universe. How can we be sure of getting better results? 
An obvious way is to take larger samples. The larger the 
sample the closer are the estimates grouped about the correct 
value as is shown in Fig. 9, which illustrates the effect of 
ae groups of four, twenty and 100 successive samples 
of 5. 


48 TEACHING OF STATISTICS IN SCHOOLS 


TABLE 7 
Results of sampling experiment 


No. of Frequenc: Frequenc; 
Ss Geant "observed 
0 33 29 
1 41 46 
2 20 21 
3 5 3 
4 1 1 
5 0 0 


In later work it will be shown that, roughly speaking, the pre- 
cision with which we can estimate the proportion of ‘black’ and 
‘white’, from a sample of size n, increases proportionally not 
with n, but with y/n. 

To complete this introductory survey of sampling a second 
sampling experiment is required. This time the beans or 
marbles are mixed in an unknown proportion and estimates of 
the proportion are made by sampling. The difference between 
the estimates and the correct value, found later by counting, 
are known as ‘random sampling errors’. 

If time is available there is plenty of scope for further 
elementary work in. sampling, both theoretical and practical. 
Some suggestions are here offered: 

(1) Examine a table of logarithms (or other mathematical 
tables) and discuss whether it could be used to provide random 
numbers. 

(2) What is the defect of the following method of producing 
a table of random numbers: Throw two dice and add their 
scores; ignore the 12’s and subtract 2 from all other scores? 

(3) Estimate the mean height (weight, marks, etc.) of a class 
from a random sample and compare it with the correct result, 

(4) Ask each member of a class to write down (a) his own 
height, (b) his estimate of the mean height of the ‘class. 
Examine the results to see if tall boys over-estimate, and short 
boys under-estimate, the mean. 


Se + 


wz 


CONCLUSION 49 


(5) From a large number of picked common daisies take a 
small sample and estimate the number of ray florets in each 
flower. Compare the result obtained from the sample with the 
correct result (i.e. correct for the whole collection). 


12. CONCLUSION 

For the teacher who has had no academic training in 
Statistics, or practical experience of the subject, these notes 
will need to be supplemented by reference to elementary text- 
books of statistics. There are several suitable books available 
but it would be invidious to attempt to give a selection from 
them. Some will be found on the shelves of any public library 
or technical bookshop and a brief examination of them should 
enable the teacher to decide which are appropriate to his 
particular needs. 


SECTION IT 


TEACHING NOTES on THE G.C.E. SYLLABUSES IN STATISTICS 


1. INTRODUCTION 


The notes that follow are intended to help those teachers of 
mathematics who, with no previous experience of Statistics, 
would like to begin teaching the subject as a branch of applied 
mathematics to G.C.E. candidates. The order and selection of 
the notes have not been determined by any one of the existing 
syllabuses but broadly cover all of them, so that for any par- 
‘ticular examination only some of the topics mentioned will be 
necessary. The notes must of course be considered merely as a 
supplement to text-books on the general theory and itisassumed 
that the reader has some of these books available. Even when 
text-books written specially for school use begin to appear it 
will still be impossible to teach the subject satisfactorily with- 
out reference to more advanced works. Two works of reference 
on the elementary theory suitable for school work are: 


Yule and Kendall—An Introduction to the Theory of Statistics. 
Griffin. 14th Ed. (34s.) 

Weatherburn—A First Course in Mathematical Statistics. 
Cambridge. (15s.) 


These two books at least should find a place in the mathe- 
matical library of schools where Statistics is taught as an 
examination subject. Reference will be made to these books in 
the notes, using for them the abbreviations YK and W. It is 
important to have in addition some source of topical data, e.g. 
copies of the Annual Abstract of Statistics (H.M.S.O. 10s.) or of 
the Monthly Digest of Statistics (H.M.S.O. 2s. 6d.). 

Some practical work in sampling and in verifying probability 
distributions is essential if the fundamental ideas of statistical 
method are to be grasped by schoolchildren. The minimum 

50 ; : 


DESCRIPTIVE STATISTICS 51 


apparatus required is very simple—packs of cards, dice, bottles 
or bags containing prepared populations of beans, marbles or 
counters, etc.—but there is scope for ingenuity in designing 
simple pieces of apparatus that illustrate particular ideas or 
speed up sampling techniques. Any co-operation that the 
science staff can be persuaded to give is invaluable in providing 
‘live’ statistical data: in return, the school teacher of Statistics 
must show his science colleagues, and particularly the biologists, 
how statistical methods can assist the teaching of science. 

It is commonly thought by teachers of mathematics that the 
mathematics of Statistics is in some way very different from 
the mathematics that is usually taught in Sixth Forms. In 
fact Statistics provides a field for the application of all branches 
of Sixth Form pure mathematics except trigonometry and pure 
geometry and, as a calculus of the discrete variable, Statistics 
is a corrective to the present over-emphasis on the calculus of 
the continuous variable. Another criticism is that Statistics 
is not ‘mathematical’; in fact the content of any school 
theoretical Statistics syllabus is limited by the content of the 
parallel pure mathematics course. To extend the theory 
covered by these notes some acquaintance with Beta and 
Gamma functions, double integration, hypergeometric series, 
ete., is required. One aim of these notes is to indicate how 
closely elementary Statistics is related to the usual school 


course in pure mathematics. 


2. DESCRIPTIVE STATISTICS 
2(a) Tabulation and Graphical Representation 

It is unnecessary to attempt a systematic and exhaustive 
treatment of these topics to begin with. Though they are im- 
portant, tabulation and graphical representation are necessary 
concomitants of all subsequent work, and therefore oppor- 
tunities of discussing them arise throughout the course. A high 
standard of neatness and lay-out of tables must be demanded, 
primarily as a means of reducing arithmetical errors. Paper 
ruled in -inch squares is useful; a foolscap loose-leaf file is 
convenient for holding exercises. Practice in the tabulation of 


52 TEACHING OF STATISTICS IN SCHOOLS 


raw unclassified data is required, but opportunities arise in 
later work, e.g. sampling experiments. Some of the difficulties 
of tabulating and classifying data are illustrated by the foot- 
notes that accompany the tables in the Annual Abstract and 
Monthly Digest. ‘ 


J 
2(b) Frequency Distributions and H: istograms 


(i) The words variable and variate are not used consistently 
in Statistics; some writers use them as Synonyms, others define 
a variate as a variable which has a probability distribution. 
The distinction is not important in school work and if it is made 
the expression ‘bivariate distribution’ might imply more than ; 
is intended. It is therefore best to use both words and to use 
them as synonyms. 

(ii) Some practice’in the selection of suitable class limits 
should be left to the pupils; for school work a total of 20 classes 
is enough. It is conventional to add to the frequencies of 
adjacent class frequencies should the value of the variable 
coincide exactly with the value dividing the two classes, 

(iii) The fact that grouping is a form of approximation is 
sometimes lost sight of. It is a useful exercise to begin with a 
large number of accurate readings of a continuous variable 
(e.g. barometric height) and to reduce them from, say, 200-300 
classes in steps finally to 2 or 3. The increasing roughness of 
the approximations is demonstrated by the corresponding 
histograms. 

(iv) The use of the descriptive terms symmetrical, J-shaped 
U-shaped, positive and negative skew, unimodal (or one-humped), 
bimodal (or two-humped) should be encouraged. 

(v) The fundamental property of the histogram is that the 
area of any column is proportional to the corresponding fre- 
quency. This point may be overlooked if frequency distribu- 
tions with equal class intervals only are used; the Annual 
Abstract provides many examples of distributions with unequal 
class intervals. 

(vi) The frequency polygon deserves little more than a 
mention. 


DESCRIPTIVE STATISTICS 53 


(vii) At this stage the drawing of two or three histograms is 

sufficient to illustrate all the essential points and to give pupils 

sufficient confidence in their use and interpretation. They will 
be used continuously in later work. 

(viii) Logarithmic scales and logarithmic graph paper are 
now so widely used that they should be introduced when a 
suitable opportunity arises. Some manufacturers offer sample 
packages which include many types of logarithmic graph 
papers that are useful in other mathematical work (e.g. for 
home-made slide rules). 


2(c) Averages; Mode, Median, Mean (YK, Ch. 5) 

(i) In comparing two or more frequency distributions, or 
their histograms, some method is required of stating their loca- 
tion along the line which represents the values of the variate. 
It is useful to remember that we usually compare distributions 
whose histograms are roughly similar in shape. 

(ii) The mode is strictly applicable only to frequency curves, 
so that at this stage it is better to use the modal group; its 
limitations are obvious. ; 

(iii) The median should be found both from its definition as 
the middle value (including the case of distributions with even 
numbers of values) and by graphical interpolation from the 
cumulative frequency diagram. In drawing this diagram note 
that the points corresponding to successive cumulative fre- 
quencies are plotted on the ordinates corresponding to the 
upper limits of the class intervals. The points are joined by a 
smooth curve (the so-called ‘ogive’). 

(iv) The general formula for the mean of a frequency distri- 
bution is an application of the weighted mean. Unless the use of 
the symbol £ is already familiar some introductory exercises 
on its use are required. 

(v) The lay-out of the first computations of the arithmetic 
mean of a frequency distribution is described in Section I, 
p. 24, 

(vi) The geometric mean is worthy of mention if only to em- 
phasise the arbitrariness of our choice of averages, and because 


54 TEACHING OF STATISTICS IN SCHOOLS 


it may be needed for Index Numbers. Note that the logarithm 
of the geometric mean of a number of variables is the arith- 
metic mean of the logarithms of the variables. 

(vii) With the possible exceptions of some model examples 
to illustrate computing techniques, all exercises should be 
planned to provide some discussion. Averages are used as a 
means of comparison; the average of a single distribution 
unrelated to any other information is of little value or 
interest. 

(viii) Class exercises selected with a view to further analysis 
will save the tedium of unnecessary repetition of computations 
if they are filed and kept. 

(ix) The check provided by the identity X f(d+1)=5 fa+-d ay 
should be demanded as a matter of routine. 

(x) Some of the algebraic properties of the arithmetic mean 
should be noted. It is important to translate the symbols into 
words whenever they are used. Thus, if m is the mean of n , 
values of x, then by definition m = > x/n. The identities — 
x (x — m) = 0, ‘the algebraic sum of the deviations about the 
mean is zero’, and X «= nm, ‘the sum of variables is equal 
to n times their mean value’, are frequently used in later work 
and must be recognised at once. 

(xi) For pupils who can appreciate it, the analogy between 
the formula m = > fx/X f for the mean of a distribution, and 
& = f yx dx/f y dx for the abscissa of the mean centre of the 
plane area enclosed between the curve Y = f(x) and the x-axis, 
should be pointed out. Otherwise, the determination of the 
arithmetic mean of a frequency distribution by finding the 
ordinate about which its histogram, cut out of cardboard, 
balances on a ruler edge, emphasises ‘what the arithmetic 
mean is doing’. Pupils with a knowledge of mechanics will 
understand why the expression X f(a — m)/n is called the ‘first 
moment of the distribution about the value m. 


2(d) Moving Averages (Section 1; YK » Ch. 26) 


(i) In a school course the analysis of time series will usually 
be limited to the use of the unweighted moving average as a 


DESCRIPTIVE STATISTICS 55 


means of ‘smoothing’ time series, i.e. of separating random 
fluctuations from the long-term trend and from seasonal and 
other periodic changes. The examples should be selected from 
data of which the background is known so that interpretation 
of the results is possible. It is a subject which has many inter- 
esting facets for mathematicians and the temptation to explore 
it in class time may have to be resisted. y 

(ii) If n is the ‘span’ of an unweighted moving average the 
(r + 1)th mean is obtained from the rth mean by the use of 


1 
the formula m}, = m, + grtn — x,). It pays to tabulate 


the work by setting out ty, a, Tsy» - » in rows of n terms, put- 

ting the second row above the first, and so on. The differences 

(4 — %,) can then be written down in sequence with the 

appropriate sign. As a check the final average should be com- 
` puted in the usual way. 


2(e) Weighted Means (Section 1; YK, p. 332) 

Practical exercises on weighted means are afforded by traffic 
counts on two roads, on the same roadat different times of the 
day, or,on different days. Comparison of the crude counts may 
be misleading if the proportions of heavy and private vehicles 
are markedly different. The devising of a suitable system of 
weighting different types of vehicle is a useful exercise. 
Information about the numbers of licences issued and the 
classification of vehicles is given in the Annual Abstract. 


2(f) Index Numbers (Section 1; YK, Ch. 25) 

For the mathematical statisticians it is sufficient to limit the 
study of Index Numbers to the fact that they are weighted 
arithmetic or geometric means of price relations, using topical 
examples. The economists will attach more importance to this 
topic and a more detailed treatment may be required: that 
provided by YK includes an account of time-reversal and 
circular tests and should meet all school needs. 


56 TEACHING OF STATISTICS IN SCHOOLS 


2(g) Vital Statistics (YK, pp. 333-337) 


Another topic of social interest that depends on the applica- 
tion of the weighted mean is the correction of vital statistics. 
For example, some health resorts on the south coast of England 
may be found to have a higher crude death rate (measured as 
deaths per 1000 of the population) than some of the smoky 
towns in industrial areas. One reason is that the population 
distributions are different; people in the industrial towns are 
comparatively young workers, whereas the coast resorts 
attract many elderly retired people. Briefly, the standardised 
death rate is obtained by weighting the death rates in each age 
class by weights based on the age distribution of a standard 
population. Similar methods are applied to birth rates. 


2(h) Measures of Dispersion (YK, Ch. 6) 


(i) A measure of the spread or dispersion of a distribution is 


the second characteristic by which we compare one frequency 
distribution with another, 


(iii) The semi-interquartile range is usually determined 
graphically from the cumulative frequency curve. It provides 
an opportunity of mentioning percentiles and their uses, e.g. 
with examination marks, anthropometric data, etc., but it is 
of little mathematical interest. te 

(iv) The mean deviation is the mean of the absolute devia- 
tion. Though it is a minimum when the deviations are 
measured from the median it is usually more convenient to 
measure them from the mean. For a first treatment it is usu- 
ally good enough to measure the deviations from the centre of 
the class-interval which contains the mean, though the correc- 
tion is not difficult. 

(v) The standard deviation, s, is defined by s* = E fd?/n 
where d is the deviation from the mean. Tt should be pointed 


DESCRIPTIVE STATISTICS 57 


out that it is a particular case of the more general root mean- 
square deviation, that it is in fact the minimum value of that 
quantity. This last relation should be proved before any com- 
putation is attempted, as it is required to explain the correction 
to be made in the computation when it is inconvenient to 
measure the deviations exactly from the mean. If the pupils 
have already met the Principle of Parallel Axes in Moments of 
Inertia it is helpful to point out the analogy between the s.d. 
and the radius of gyration of a plane lamina of the same shape 
as the histogram about a line in the plane which is parallel to 
the y-axis and which passes through the mean centre of the 
histogram. 

(vi) The tabulated computation of the s.d. is described in 
most elementary text-books. Note that it is usually quicker 
to find fd? from fd x d than from f x d?. The check based on 
the identity Df(d +1)? =Zfd?+2zUfd+XUf should be 
demanded. As the computation is tedious, examples set as 
class-work should be modest unless the work can be shared out. 

(vii) Sheppard’s correction for grouping is not required (nor 
is it generally applicable) but it is important to point out that 
grouping does introduce small errors. i 

(viii) It is of little interest to compute any of the various 
measures of dispersion (except as examples of computing 
techniques) unless they are used in comparing distributions. 

(ix) A comparison of the usefulness of the several measures 
of dispersion is best deferred until the pupils have had some 
experience of all of them. 

(x) Note that the dimensions of the measures of dispersion 
are those of the variable. 


2(j) Standard Measure : 

(i) Distributions of marks, particularly of intelligence-test 
scores, are sometimes compared by converting the ‘raw’ scores 
to ‘standard’ scores, which are obtained by expressing the 
deviations of the raw scores from their mean as multiples of 
the s.d. Sometimes a further conversion of the marks to a 


standard scale with a given mean and s.d. is performed. One 
E 


‘58 TEACHING OF STATISTICS IN SCHOOLS 


or two numerical exercises, as extensions of the computation of 
the s.d. of a distribution of marks, should suffice, e.g. Find the 
mean and s.d. of a given distribution of marks and convert 
them (or some of them) to a distribution which has the mean 
100 and s.d. 15. 

(ii) The coefficient of variation, 100s/m, is sometimes used to 
compare variabilities, e.g. whether boys of 14 years vary more 
in height or in weight than boys of 17 years. It is obvious that 
it can be used only when the ratio s/m is small. 


2(k) Variance 


(i) The variance of x, V(x), is the square of the standard 
deviation. It is a statistic of great importance in more advanced 
work. Some algebraic exercises on it provide results that can 
be applied to elementary, work. 

(ii) In the combining of two sets of n, and na observations of 
a variable with means m,, mẹ, and variances V, and Vz, the 


variance V of the combined sets of observations can readily be 
shown to be given by 


(my + na) V = mV, + NVa + —_ (m, — my)? 
Ny + Ng 
(Cf. the combined M.I. of two masses M 1, Ma, with moments 
of inertia I,, I, about their mean centre.) 


If n, = n, and m, = my, then 
V =, + Ya) 
(iii) If z, = ax, + by, where a and b are constants, then 
2z,=aba,+bdy, 


and therefore 2 = az + bg, where 7, 9, Z, are the means of the 
2’s, y’s and z’s, Hence 


E (z — 3)? == {ale, — 2) + by, — H}2 
= ad (x, — 3)? + by, — 9)? + 2ab E (x, — zy, — 9) 


If x and y are independent, the product term may be 
neglected in comparison with the others, and so 


DESCRIPTIVE STATISTICS a 59 
Ve) = aV) +T) 
In particular if a = b = 1, or if a = — b = 1, then 
V(z) = V(x) + Fly) 


(iv) The results: (a) That È (x, — £)(y — #) can be neglected 
in comparison with E (x,—)? or D (y, — 9)»; (b) That 
V(z) = V(x) + V(y) for the sum or difference of x and y, where 
x and y are independent variables, needs practical verification. 
Pairs of the digits 1, 2,...9 drawn from two card packs or 
from a table of random numbers are suitable; if the zeros are 
ignored the computation becomes simpler since = ¥ = 5 and » 
fractions are avoided. 


2(l) Continuous Distributions (W, p. 12) 

(i) It is helpful to mathematical pupils familiar with the 
notations of the calculus if the descriptive statistics of con- 
tinuous frequency distributions are given in parallel with those 
of discrete distributions. 

(ii) ‘The limit of a histogram in which the class intervals 
become infinitesimally small as the frequencies tend to infinity 
is a smooth curve which we assume can be expressed in the 
form y = f(x). If the continuous variable x can take all values 


b 
within the range a to b and if f f(x) dx = 1, so that the area 
a 


under the curve in this range is unity, the curve y = f(x) is 
called the relative frequency curve of the distribution. The func- 
tion f(x) is called the relative frequency density. The area 
property of the histogram is retained, i.e. the relative frequency 


k 
of the interval h to k is f f(x) dx. 
h 


(iii) For the relative frequency distribution y = f(x) we have 
the mean, 


p= f= f(x) dx 


60 TEACHING OF STATISTICS IN SCHOOLS 


the variance, 


b 
o? = | @— wtf) de 


= f z? f(x) dx — u? 
a 


The cumulative frequency curve is given by 


v= | jeas 


and so the median is given by 


T 
freee =a 
a 
and so on. 
(iv) The functions 32°(0 <x <1), e=(0<r< ©), 


2 1 woa 

— gin? —2 si < per omn (_ 

z x(0 <a <7), zu sin 2(0 <t <n), EE (—o<r< 0) 
provide values of f(x) suitable for simple exercises. 


3. PROBABILITY AND SAMPLING 
3(a) The Laws of Probability 

(i) Some revision of permutations and combinations may be 
needed. 

(ii) One mathematical definition of probability is: If an 
action can entail any one of n equally likely results, and if 
m of these results entail the occurrence of an event E and the 
remainder do not, then the ratio m/n is the probability of F. 

An empirical definition of probability is: If an event Æ, one 
of the mutually exclusive results of a trial, is found to occur 
m times in n trials, then the limit of the ratio m/n as n tends to 
infinity is the probability of F. 

There are logical objections to both definitions though 
schoolboys are unlikely to raise them. A criticism of the first 
definition is that ‘likely’ is a synonym for ‘probable’ and that 
therefore the definition is circular; in the second definition it is 
assumed that the limit of m/n exists in the mathematical 


PROBABILITY AND SAMPLING 61 


sense. In a first school course of Statistics both definitions are 
required and are used as equivalents. A critical examination 
of them must await experience of their application by the 
pupils. For this later review the short discussion in Aitken, 
Statistical Mathematics, pp. 5-15, is useful, and Bernouilli’s 
theorem as presented in W, pp. 33, 34, is of interest. 

(iii) Proofs of the theorems of total (‘either-or’) and com- 
pound (‘both-and’) probability are given by W (pp. 22, 23). 
These theorems are sometimes known as the ‘addition and 
product rules’. A third rule—the ‘at least one’ rule—is worth 
giving: If the probabilities of n independent events are p,, Do, 
. .. p, then the probability of at least one of the events occur- 
ring is 1 — (1 — p,)(1 — p,.)... (1 — Pn), e.g. the chance of 
at least one 6 in the tossing of n unbiassed dice is 1 — (%)". 

(iv) The useful concept of the expected value or expectation 
for the mean value of a variable should be introduced (W, 


p. 24), e.g. the expectation in a throw of two dice is 7. 


3(b) Some Approximations 

A list of approximations that are used in subsequent work is 
appended; some numerical examples of their accuracy are 
needed to make their validity convincing. 

(i) Logarithmic approximations. 

‘ log, (1 + x) ~ + x — hx? 
for small values of a is well known. 

(ii) Exponential approximations. 

(a) e w1+a,e* ~ l — wif x is small. 

(b) When a is very small, (1 + 2)" œ e”7, (1 — x)” ~ e-™. 

(c) e ~ (1 + n)"+3. The error is of the order of 1/12n? for 
any value of n; e.g. when n = 2 the approximation gives 
e ~ 2-7559. 

Gii) Stirling’s approximation. 

n! a (22n)te™ .n” 


for large integral values of n. There are various proofs but all 


62 TEACHING OF STATISTICS IN SCHOOLS 
are either very long or beyond the usual school course. It is 
used mainly to simplify ratios of factorials; in the form 
nl e-tnntt 
mi X mmni 
it can be verified by the use of the approximation (c). 
3(c) The Binomial Distribution 
(i) Establish the binomial distribution by considering a 
repeated trial generally (W, p. 28). S 
(ii) There are various proofs of the formulae u = mp, 


o = (npg): (W, p. 28, 46; YK, p. 174). More direct proofs are 
simplified if the results below are first established: 


(a +e = (atar 


Differentiate with respect to v, 


na +y- = X] (pee ‘ z= (1) 

Multiply both sides by æ and differentiate again. On reducing 
the left-hand side, 

n(a + x-a + ng) = E? (artar . (2) 


(ii) It is important to show that the binomial distribution 
for unequal values of p and q tends to a symmetrical form as 
the exponent increases (YK, p. 172). 

(iv) Numerical examples on the binomial distribution will be 
concerned to establish its use as a probability distribution by 
computing probable frequencies in simple cases and comparing 
them with observed and experimental frequencies. 


(v) Show that for large values of n (100 or more) about 99% 
of the area under the binomial histogram is included between 
the ordinates at u + 30, and about 95% between the ordinates 
at u + 20. 


3(d) The Poisson Distribution 
(i) The derivation of the Poisson distribution from the bi- 


PROBABILITY AND SAMPLING 63` 


nomial usually requires the use of Stirling’s approximation for 
n! (YK, p. 190; W, p. 64). Alternative methods: 
(a) It is required to show that, when n —> œ and np = a, 


eu 
5 
(re = (‘era — pyr 
‘ nt a\* Gnr 
m= A) (1 F 3) 
n! a\* a\" ar 
~ (n= wl 2) ( 9) r! 


n! a£ 
ae ee 
(n—r)i(n—ay r! 


N 
a constant, the limiting value of ( yr is e~%a"/r!, 


and the limit of the coefficient for finite values of a and r can 
be shown directly to be 1. 


a n 
(6) =a—2=(1-$) ere 
a a\" 
ng”-1p = np(l — p)! = (1 — 9) a aet 


Arap a 


sate 1 a ay Mee 


2! 2! ga n 


a” 4 he 
Similarly the (n + 1)th term is [e-* for sufficiently large n. 


Discussion of the error term need not delay some illustrative 
examples. 

(ii) The formulae y = 4,6 = Va, can be derived directly from 
the known binomial results u = np, = (npq)}, or by taking 
first and second moments about the origin (YK, p. 191). 
Note that the variance is numerically equal to the mean or 
expectation. 


(iii) Histograms of some Poisson distributions should be 


64 TEACHING OF STATISTICS IN SCHOOLS 


drawn, e.g.a = 
as a increases. J 

(iv) In the absence of tables of the Poisson distribution 
numerical examples are limited t 
and computed frequencies of distributions in which a is not 
d to simple examples of the type: Itis 


rage, 24% of a mass-produced article are 


defective: what is the probability of finding 3 or more defective 
articles in a package of 80? 


[Expected number per package = 2 = a. Hence 


2, 1, 3, 10, noting the trend towards symmetry 


i 2 2? 
Pater. 745) =i se 
= 0:3245] 


(a) See games with car registration numbers in Section I, 
p. 4l. 


(b) A few grains of sand are 
-inch squares. The number of 


triplets in equal secti 


' (d) Samples are drawn from a 
marbles, etc., usi 


3(e) Simple Sampling of Attributes (Section I; YK, Ch. 16, 
17, 23) 


(i) The first need is to establish meanings for the terms 


itm aa ae ee ee 


PROBABILITY AND SAMPLING 65 


random, bias and simple sampling; many useful ideas are given 
in the YK reference quoted. Secondly, a clear distinction 
should be drawn between the sampling of attributes and the 
sampling of variables. 

In simple sampling it is assumed that the selection is random 
and, further, that the chance of a success, p, is constant for 
each item drawn. Each item must therefore be replaced before 
the next item is selected, or the population from which the 
sample is drawn must be very large, e.g. the probabilities of 
hearts in 4-fold samples from a pack of cards are: 


No of hearts in sample 0 1 2 3 40 
With replacement. . 0-3164 0-4219 0-2109 0-0469 0:0039 
Without replacement . 0:3038 0-4388 0:2135 0:0412 0.0026 


(Hogben, Chance and Choice, Vol. 1, p. 97.) 


(ii) Since in simple sampling of attributes we are concerned 
with repeated independent trials, each with a constant proba- 
bility of success, the probability of r successes in an n-fold 
example is given by the (r + 1)th term of the expression of 
(p + q)”. The binomial distribution then becomes the samp- 
ling distribution, i.e. the standard or model against which our 
sampling of attributes is to be compared. We already know 
that for the binomial distribution y = np and o = (npq)t, or 
if we use proportions instead of actual frequencies, y = p and 
o = (pq/n)}. 

(iii) In a first course there are two types of question to which 
the theory of the simple sampling of attributes gives an 
answer, though only in terms of probability: 

(a) If an n-fold sample gives a proportion, p, of items with a ` 
certain attribute, what is the proportion of items with that 
attribute in the sampled population? 

(b) If in a trial experiment to test a hypothesis a proportion 
P, of items with a certain attribute is found in an n-fold 
sample, does this result invalidate the hypothesis that the 
proportion is p? 

(iv) In questions of the first type the correct value of p is 
unknown. The estimate of p provided by the sample is 


66 TEACHING OF STATISTICS IN SCHOOLS 


accepted and is used to determine the standard error of the 
sampling distribution, i.e. (pq/n)#. From our knowledge of the 
binomial distribution for values of n of 100 or more we can 
then say that the required proportion lies within the range 
P +3 X se. with a probability of about 0-99. 

(v) In questions of the second type the s.e. is computed 
from p. Then if | p — p, | < 3 x s.e., we say that the differ- 
ence could arise by fluctuation of random sampling or that the 
difference is not significant at the 30 level, and that the result 
affords no evidence on which to reject the hypothesis. If 
|p — p, | > 3 x s.e., we say that it is very improbable that 
this difference could arise from a random sampling, or that 
the difference is significant at the 30 level, and that the differ- 


ence is sufficiently great to lead us to doubt the hypothesis or 
to re-examine the conditions of sampling, 


3(f) The Normal Distribution 


(i) Though the derivation of 
curve from the binomial distrib 
for examination 
W, p. 64). 


(ii) The next step is to inv 


the equation of the Normal 
ution is not usually required 
purposes it can be demonstrated (YK, p. 177; 


estigate the properties of the 
n exercise in curve tracing. 
tions have not been discussed 
needed to introduce the terms 
The area under a probability curve is 
and therefore the equation 


f tes x?/2a?) dx = 1 


has to be solved for the constant k. 
Gamma functions is given by W, p. 65. 
(iv) As the curve is introduced as an approximation to the 


binomial histogram a numerical and graphical example to 
illustrate the goodness of fit will be needed. 


(v) Teachers will find the use of Arithmetic Probability 


A solution avoiding 


PROBABILITY AND SAMPLING 67 


graph paper very useful for the preparation or checking of 


examples on the Normal distribution. The paper is ruled so 
T 


that the cumulative frequency curve y = f exp(— x*)dx 


-o 
becomes a straight line. Values of y and o for a given distribu- 
tion can be obtained directly from the graph, or alternatively 
the paper provides an easy method of preparing a Normal 
frequency distribution with a given y and o. 

(vi) For future work, tables of ordinates of, and of areas 
under, the Normal curve will be required. For most school 
purposes carefully drawn graphs can give the necessary degree 
of accuracy and have the advantage over tables of showing 
more clearly what is being done when reference is made to 
them. For the ordinates it is necessary to plot values of 
oy = (2n)+ exp(— x?/20°) for a few values of x/o (e.g. 0, 
4, 1,...3, 34). For the areas it is necessary to evaluate 


N = 05 


è 04 PN = Area between Ordinates atOand N 04 

is ‘ 

Š PN, -Areo of Tail beyond Ordinate at N 

€ 

5 

=03 3 

© 

5 

u 

5 

D 

b02 02 

4 

3 

a 

£ 

3 

S 01 Ol 
O; rm0 

(o) 05 FOTN 20 25 30 35 


Deviation in terms of o 


Fic. 10. Ordinates of, and areas under, the Normal curve. 


QAIND [OWJON 247 Japun oaiy 


68 TEACHING OF STATISTICS IN SCHOOLS 


(sen exp(— 27/2c*)dx for the same values of z/o by 
o TT, 


expanding the exponential in series and integrating term by 
term (vide Whittaker and Robinson, Fhe Calculus of Observa- 
tions, pp. 180, 181). The form of the graphs shown in Fig. 10 
is convenient for reference. 

(vii) The Normal is the only continuous distribution listed 
in most of the syllabuses, but pupils should not be allowed to 
conclude that it is the only one and that all sampling distribu- 
tions are Normal. The name ‘Normal’ can mislead, particu- 
larly if it is written ‘normal’; a distribution that is ‘not normal’ 
can only be ‘abnormal’! The name ‘Gaussian’ is an alternative 
though it does less than justice to De Moivre and Laplace. 


3(9) The Rectangular Distribution 


The rectangular distribution y=1/(b—a),a <x <b can 
be both continuous and discrete. It is of no special statistical 
importance, but it is useful to the teacher because its simple 
properties are easy to investigate both mathematically and 
practically; its simplicity enables it to be used in the class- 
room as a kind of ‘proving ground’ for many statistical ideas. 


3(h) Sampling of Variables—Large Samples (YK, Ch. 18; W, 
p. 116) 


(i) In the sampling of variables the items selected have a 
value of a variable which may be discrete or continuous over a 
range which may or may not be known. The binomial pro- 
vided the probability distribution for the sampling of attri- 
butes; for the sampling of variables we have to seek the 
probability distribution that is appropriate to the population 
being sampled. 

(ii) Typical problems in the sampling of variables are: 

(a) What is the mean of the population sampled if an n-fold 
sample has a mean value of m and a standard deviation s? 

(6) Is the difference between an expected mean u and the 
mean m given by an n-fold sample significant? 


PROBABILITY AND SAMPLING 69 


(c) The means of two samples are m, and m,; is the difference 
of these means consistent with the hypothesis that they were 
drawn from the same population? 

(iii) These questions again can be answered only in terms of 
probability and to answer them we need to know the standard 
error of the mean. 

(iv) A number of n-fold samples drawn from the same popu- 
lation gives a distribution of values òf m. This fact should be 


` established practically (for small samples) by, for instance, 


finding the means of samples of 4, 9, . . . random numbers. Is 
it possible to determine the characteristics of these distribu- 
tions of m? 

We already know that if z, =, + y,, where x, and y, are 
independent, then V(z) = V(x) + V(y). If m is the mean of 
Tis %q,... %,... Ëp, we know that 

nm = %-+%_+ .-- +a,+...+% =22, 
and as the values of a are independent, we have by extension 
of our variance formula, 
V(nm) = E V(2,) 
But V(x) = 02, the variance of the distribution of x’s from 
which the sample is drawn. 


Therefore V(nm) = no? 
and s.d.(nm) = oyn 
or s.d(m) = o/y n 


The standard deviation of any statistic of a sampling distribu- 
tion is called the standard error of that statistic. Hence we 
have s.e. (m) = o/s/n, where g is the s.d. of the universe from 
Which the sample is drawn and n is the sample size. Note that 
this formula has been obtained without specifying the form of 
the sampled distribution. ate 

(v) This result gives only the s.d. of the distribution of 
means, though we have implicitly assumed that the mean of 
the distribution is the mean of the sample means. We know 
nothing however of the form of the distribution. It can be 
Shown theoretically that if the sampled population is Normal, 


70 TEACHING OF STATISTICS IN SCHOOLS 


then the distribution of means is also Normal, but the distribu- 
tion of means of samples drawn from other one-humped dis- 
tributions at least approximates to the Normal. Though these 
facts cannot be established theoretically in a school course 
they can be shown by practical examples to have some basis. 

In applying the s.e. of the mean in answering the questions 
of para. (ii) we can therefore use the 3a level of significance. 
If the s.d. of the population is unknown, then the information 
provided by the samples is pooled; since they are large samples 
the estimates of o they provide are not greatly in error. 


3(j) Population Parameters and Sample Statistics 


It is conventional to distinguish between parameters which 
refer to a universe or population, and estimates of them derived 
from a sample, by using Greek letters for the parameters and 
corresponding Latin letters for the estimates. Thus m is used 
to denote an estimate of x, s to denote an estimate of o, r to 
denote an estimate of p, and so on. In a school course it is 
helpful to bear this useful convention in mind from the begin- 
ning of the course so that the distinction that has to be made 
when sampling is considered is not blurred by earlier violations 
of the convention. 

Unfortunately this useful convention breaks down when 
applied to the usual symbols of the binomial distribution. The 
letters p and q are conventionally used to denote both the 

- observed and the theoretical proportions of successes and 
failures; their Greek analogues z and y could be confusing since 
they have other established meanings which may be simul- 
taneously required. Other possibilities are to use p = est(P), 
q = est(Q), or a = est(a), b = est(6), but of course any such 
convention must first gain general acceptance. 


3(k) Levels of Significance and Confidence Limits k 


(i) If a sampling distribution is known to be Normal it is 
possible to be more precise about levels of significance. First it 
should be noted that levels of significance are purely arbitrary; 


PROBABILITY AND SAMPLING 71 


the choice depends in part on the nature of the problem, on the 
importance of any action that depends on the result, on the . 
caution of the investigator. There are two levels which are - 
commonly used, the 5% and the 1% levels. These correspond 
roughly to the 2o and 3ø levels of the Binomial distribution, 
since for the Normal distribution 5% of the area under the 
curve lies outside the ordinates at u + 1-960, and 1% outside 
those at u + 2-580. The ordinates at y + 30 exclude all but 
0-27% of the area under the Normal curve. For some purposes 
levels of 0-1% may be required. Since the level is a matter of 
choice it is important always to complete any statement about 
significance by specifying the level used. 

(ii) If the sampling distribution is known, but is not Normal, 
the deviations corresponding to 5% and 1% levels of signifi- 
cance can be determined. If nothing is known of the sampling 
distribution then at worst the probability that a sample value 
will differ from the mean by more than Ao is not greater than 
1/42 (Tchebychef’s theorem; W, p. 33). Usually however we 
can do much better than this; most elementary statistics 
computed from random samples of non-Normal distributions 
are approximately Normal if the sample is large. 

(iii) Sometimes confidence or fiducial limits are used. The 
terms are not strictly synonymous but the distinction is 
not important in a school course. For example, if for a 
sample drawn from a normal population m = 3-60 with a 
standard error of 0-12, the 95% confidence limits for are 
3-60 + 1-96 x 0-12, i.e. 3-84 and 336000 

(iv) The form of a statistical question involving significance 
is sometimes important. The questions: ‘Is this coin biassed? 4 
‘Does this coin give more heads than we expect of an unbiassed 
coin?’ are statistically different. If a trial of 100 tosses pro- 
duced only 10 heads we would have little hesitation in answer- 
ing ‘Yes’ to the first question and ‘No’ to the second. To the 
first question we apply a ‘two-tail’ test, i.e. we allow for the 
area under the probability curve outside both significance 
limits; to the second question we apply @ one-tail’ test, i.e. we 
are concerned with the area under only one tail. 


712 TEACHING OF STATISTICS IN SCHOOLS 


3(1) Sampling of Variables—Small Samples (YK, Ch. 21; W, 
Ch. X) 

(i) In considering small samples (i.e. m < 100) it is necessary 
to subject the theory of large sampling to a critical examina- 
tion for possible errors and then to allow for them. The 
results are: 

(a) First the form of the sampled population has to be 
specified; the notes that follow apply only to a sampled popu- 
lation that is Normal or approximately so. 

(b) The distribution of the means of an n-fold sample drawn 
from a Normal population (mean jz, variance g°) is also Normal 
(mean u, variance o?/n) whether the sample is large or small. 
Hence m = est(u) as for large samples (W, p. 119). 

(c) The variance of the sample, s*, gives an estimate that 
must be corrected for the systematic error that arises from the 
fact that in computing s? the deviations are measured from m 
and not from y (which is usually unknown). By making the 
adjustment it is found that the ‘unbiassed’ estimate of g? is 
given by E (x — m)?/(n — 1) rather than by = (x — m)2/n 
(W, p. 181). Note that ‘unbiassed’ is here given a technical 
meaning (YK, p. 547). 

(d) In the theory of large samples we assumed that the dis- 


tribution of Sh) was Normal, or approximately so. This 


s/a/n 
distribution, however, deviates appreciably from the Normal 
if n is small. The exact distribution of this statistic, called the 
t-distribution, was first determined by ‘Student’ and tabu- 
lated: the derivation is not required for a school course but is 
given in W, p. 186. 

(e) Using tables of the ¢-distribution it is possible to deter- 
mine the significance at any given level of a difference between 
the mean of a sample and its hypothetical value, or between 
the means of two small samples, by methods analogous to 
those used for large samples. 

(f) This more exact test applies equally well to large samples 
as to small ones, but as the sample size increases the t-distribu- 


PROBABILITY AND SAMPLING 73 


tion approaches the Normal distribution more and more 
i closely. For samples of 100 or more, therefore, the more con- 
venient Normal probabilities are usually applied. 


TABLE 8 
| An approximately normal distribution 
(n = 1000, u = 50, o = 10) 


Pop. No. Value Frequency Value Pop. No. 
1 18 1 82 1000 
2 20 1 80 999 
3 21 1 79 998 
4 22 1 78 997 
5 23 1 HE 996 
6 24 1 76 995 
7- 8 25 2 75 993-994 
9- 10 26 2 74 991-992 
1l- 13 27 3 73 988-990 
14- 17 28 4 72 984-987 
18- 21 29 4 71 980-983 
22- 26 30 5 70 975-979 
27- 33 31 7 69 968-974 
34- 41 32 8 68 960-967 
42- 50 33 9 67 951-959 
5l- 61 34 11 66 940-950 
62- 74 35 13 65 927-939 
75- 89 36 15 64 912-926 
90-106 37 17 63 895-911 
107-125 38 19 62 876-894 
126-147 39 22 61 854-875 
148-171 40 24 60 830-853 
172-198 41 27 59 803-829 
199-227 42 29 58 774-802 
228-258 43 31 57 743-773 
259-291 44 33 56 710-742 
292-326 45 35 55 675-709 
327-363 46 37 54 638-674 
364-401 47 38 53 600-637 
402-440 48 39 52 + 561-599 
441-480 49 40 5l 521-560 
481-520 50 40 


74 TEACHING OF STATISTICS IN SCHOOLS 


(ii) The concept of degrees of freedom arises in the applica- 
tion of the t-test. The number of degrees of freedom is the 
number of independent variables from which a statistic is 
computed. In computing the mean of an n-fold sample there 
‘are n degrees of freedom; in computing the variance of the 
sample we have to sum X (x,— m)? where X (x, — m) = 0, i.e. 
the n variables (x, — m) are constrained by the relation that 
their sum is zero, and one degree of freedom is lost. Hence the 
‘unbiassed’ estimate of the variance of a sampled population is 
2 (x,— m)?/(n — 1) if m is estimated from the sample, or 
= («,— u)?/nif wis the known mean of the sampled population. 

The concept becomes clearer if very small samples are con- 
sidered. A sample of one gives an estimate of the mean but no 
information about the variance unless the mean is already 
known: a sample of at least two is needed to give an estimate 
of the variance if the mean is unknown. 

(iii) For sampling tests with small samples an artificial 
Normal distribution is useful. It can consist merely of a 
Normal frequency distribution to be used with a table of 
random numbers, or of a set of counters each marked with a 
number which can be sampled as required (Table 8). 


4, BIVARIATE DISTRIBUTIONS (Section I; YK, Oh. 9; W, 
Ch. IV) 


4(a) Scatter Diagrams 


sentation by a ‘dot-diagram’ should be shown. 
(ii) Meanings for the terms unassociated variables, variables 


with linear association, variables associated non-linearly are 


4(b) Linear Regression 
(i) The lines of regression are defined as the loci of the means 


BIVARIATE DISTRIBUTIONS 75 


of the two sets of arrays (i.e. the rows and columns) of a 
grouped frequency distribution. In small samples the means 
are rarely exactly linear and some method of estimating the 
lines is required. A graphical method is described in Section I. 

(ii) The direct calculation of the equations of regression lines 
involves the computation of the covdriance, or first product- 
moment, of the distribution. It is given by 

E (a, — mz)(Yp — m,)/(n — 1) 
and some of its algebraic properties should be established 
before its computation is explained. One property that will 
be needed is that 
Lay, = 2 X,Y, — nab 

where z, Y, are the deviations from the true means; X,, Y, are 
the deviations from arbitrary means, and a, b, are the devia- 
tions of the true means from the arbitrary means, 

(iii) The deviation of the formulae of the regression co- 
efficients using the Principle of Least Squares is given in W, 
p. 70, and avoiding calculus methods in YK, p. 216. 

(iv) Before computing the coefficients it is important to 
transform the data to the simplest possible form for computa- 
tion (and to transform the equations back again after the 
computation). 

(v) The following variations of the computation occur: 

(a) Direct computation of both lines from ungrouped data 
(YK, p. 224). 

(b) Computation of both lines from grouped data, (i) making 

individual entries of the deviation products for each 

bivariate group (YK, p. 226), (ii) using the ‘diagonal’ 
method based on the identities 
E(w + y)? = E (2°) + B (y?) + 22 (ay) 
by which the products are computed using tables of squares 
(YK, p. 229). s 4 

(c) Computation of a linear trend line for time-series. 

(vi) Though significance tests of regression coefficients are 
not required in a school course it should be pointed out that 


they are subject to sampling errors. 


76 TEACHING OF STATISTICS IN SCHOOLS 


4(c) The Product-Moment Correlation Coefficient 


(i) The product-moment coefficient of correlation, r, is a 
measure of the linear association of two variables. A low value 
of | r | does not necessarily imply a lack of association; a high 
value of | r | does not imply a causal relationship between the 
two variables. 

(ii) The coefficient is subject to sampling errors which in 
small samples can be very large, e.g. the minimum values of 
| 7 | that are significant at the 5% level for 10, 20, 50, 100 pairs 
of values drawn from a Normal bivariate population are 0-63, 
0-44, 0-27 and 0-20 respectively. ; 

(iii) Some properties of r should be established, e.g., (a) that it 
is unaffected by linear transformations of the variables, (b) that 
it is the geometric mean of the regression coefficients. 


4(d) The Rank Correlation Coefficient 


(i) The rank correlation coefficient is obtained as the product- 
moment correlation coefficient of the paired ranks of the data 
instead of the crude data. The derivation of the simplified 
formula and the procedure to be adopted when tied ranks 
occur are given in YK, pp. 261, 264. 

(ii) The probability distribution of the coefficient for small 
samples (no ties) provides interesting algebraic exercises. 


(iii) Algebraic proof that the coefficient ranges from + 1 to 
— 1 forms a simple exercise, 


5. SIMPLE CONTINGENCY TABLES (YK, Oh. 1, 2) 


(i) The syllabus item ‘simple contingency tables’ occurs 


under the heading ‘Two-variate distributions’ and it is there- 


fore assumed that the theory of 2 x 2 contingency tables will 
suffice. 


(ii) A first need is to establish the notation (following YK): 


(a) A and B designate attributes; « and B the absence of 
these attributes. 


(b) (A) denotes the number of items with the attribute A; 
(AB) the number with both attributes, and so on. 


SIMPLE CONTINGENCY TABLES 77 


(iii) On the hypothesis of independence, expected frequen- 
cies are given by the entries in the table: 


Attribute A Totals 
in S SY x (A)(B)/N («)(B)/N (B) 
Bae a (A)(B)/N (a)(B)/N (B) 
Totals. . . (A) (a) N 


(iv) Allowance for sampling fluctuations must be made in 
any discussion of results, but as no criteria for testing are 
required the expected and observed frequencies must differ 
markedly before any association can be presumed and the 
samples must be large. 

(v) A simple measure of association is helpful in discussion. 
That provided by: 

_ (AB)(aB) — («B)(AB) 

~ (AB)(aB) + («B)(AB) 
is suitable. It is zero if the attributes are independent and 
ranges from +1 (complete positive association) to — 1 
(complete negative association). 

(vi) An example: In an examination of 40 boys 22 passed in 
both Mathematics and Physics, 7 failed in both subjects and 
5 failed in Mathematics only. Complete the tebie, compare 
these frequencies with those expected on the hypothesis of 
complete independence, and comment. The complete table is: 


Mathematics 


Fail Totals 


Physics Pass 
Rass’, Weie e 22 [19] 5 [8] 27 
Wail. i (8 fae & 6 [9] 7 [4] 13 
Totals . - + =- 28 12 40 


78 TEACHING OF STATISTICS IN SCHOOLS 


The expected frequencies are 27 x 28/40, 12 x 27/40, 
28 x 13/40 and 12 x 13/40, which to the nearest whole 
numbers give the numbers in square brackets. ‘The value of 
Qis (22 x 7 — 5 x 6)/(22 x 7 +5-x 6) = 0:67, which sug- 
gests a fairly strong positive association 
Mathematics and passes in Physics. 

This example, with admittedly a small sample, illustrates 
the difficulty of deciding without some criterion or some 
experience how big a deviation from the expected value must 


be before it can be stated with confidence that it is unlikely to 
be a random fluctuation. 


between passes in 


INDEX 


APPROXIMATIONS, 61 

Arithmetic mean, 15, 28, 53 

Association, 32, 77 

Average, 15, 28, 53 
moving, 7, 54 


BAR-OCHART, 2 

Bias, 43 

Binomial dist., 46, 62, 65 
Bivariate dist., 74 


Car NUMBERS, 41 
Charts, 2 
Class-data, 2 
Coefficient of correlation, 76 
of regression, 75 
of variation, 58 
Coin-tossing, 37 
Compound units, 14 
Confidence limits, 71 
Contingency, 76 ~ 
Continuous dist., 59 
Correlation, 30, 76 
Covariance, 75 
Cumulative frequency, 53, 60 


DEGREES OF FREEDOM, 74 
Diagrams, 2 
Die-throwing, 38 
Dispersion, 23, 29, 56 


EXPECTATION, 61 
Experiments, 16, 17, 18 


Frequency Dist., 24, 52 
polygon, 52 


GEOMETRIO MEAN, 53 
Graphical representation, 2, 51 


Histrocram, 20, 52, 54 


INDEX NUMBERS, 22, 55 


Maps, 11 

Mean deviation, 29, 56 
Median, 53, 60 

Mode, 53 

Moving average, 7, 54 


Norman Dist., 66 
table, 77 


PERCENTAGE, 12 
Percentiles, 56 

Pie-chart, 2 

Poisson dist., 62 

Principle of least squares, 


Probability, 36, 60, 61 


RANDOM NUMBERS, 45 
Range, 19, 24, 56 
Rates, 12 

Ratios, 12 
Rectangular dist., 68 
Regression, 33, 74 
Relative frequency, 59 


Sampine, 42, 60 

of attributes, 64 

of variables, 68 
Scatter diagram, 74 
Semi-interquartile range, 56 
Sheppard’s correction, 57 
Significance, 70 
Standard deviation, 56 

error, 66 

of means, 69 

measure, 57 

Stirling’s formula, 61 


ae INDEX 


1 52, 74 Vital statistics, 56 

VARIABLES, 52, . 

ae WEIGHTED Mean, 20, 53, 
iate, F 

Nemes D 58 


Form No. 3, r Y 


3 PSY, RES.L-1 
Bureau of Educational & Psychological 
Research Libra rary. 


m 
‘The book is to b returned within, 
the date stamped las : 


Continued from back cover 


| SOME OPINIONS ON —— 
ao g i |ETHOD” 
2 | 
Q 
Form No. 4 
Book CARD report on “The 


Acen. No By f pe 


fourse adequate 
No -$ in Education; 
foundation 

~ much 


ests of 

SIEGE; “Fee SSO aT COT CUTS EHC ECNE WHO Srog sea tistics 
in his classes will find all he needs for himself or for his sixth form pupils in this 
book. The beginner at research, whether in the field of education or elsewhere, 
cannot afford to be without it. 


W 
h School Texthook of Statistics, published in 1951 
E e A ee eee 


AN INTRODUCTION TO ; 
“STATISTICAL METHOD 


Se ren ae ee 
- B. C. Brookes, M.A. 


‘Lately Senior Mathematics Master, Bedford Modern School. 
Lecturer in the Presentation of Technical Information, 
University College, London. >. my y : 
Member of the Teaching Committee ofthe s, 
Royal Statistical Society 


‘ and i KS 
W. F. L. Dick, M.A. 
Technical Officer, ragga Chemital Industries Pi 

This elementary introduction has been written to cover all school 
aeae, Tt is also useful for anyone who needs a simple and read- 
able introduction to the subject. A selection of opinions is given here on 
‘the inside cover. N 

Unlike many other “introductions” this book really begins at ‘the ‘ 
beginning. It ranges from Descriptive Statistics to the Design of Experi- à 

~ ments. Very little previous mathematical knowledge is required (G.C.E. * 
~ Ordinary Leyel—though some elementary calculus methods are uséd). 

A special feature is the collection of over 300 exercises and examples, < 
with answers and hints for solution. Many of these are not artificially 
prepared, but are actual examples taken from the many fields of activity 
Where statistics are used today. They include practical experiments. 
which can be carried out by groups of students. F 
The authors have a wide experience, both of teaching statistical . 

methods at an elementary level and of applying them in practice. 


Available in Two Editions: 
Parts I and TI together: 21s. net. 


Part I alone: 9s. 6d. 


i For Opinions, see inside cover 


WM. HEINEMANN LTD. 99 GT. RUSSELL ST., W.C.x 
Be, D H 


